Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Importing Wikipedia in Plone
Search
Makina Corpus
October 02, 2013
Technology
1
71
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
Makina Corpus
October 02, 2013
Tweet
Share
More Decks by Makina Corpus
See All by Makina Corpus
Publier vos données sur le Web - Forum TIC de l'ATEN 2014
makinacorpus
0
700
Créez votre propre fond de plan à partir de données OSM en utilisant TileMill
makinacorpus
0
120
Team up Django and Web mapping - DjangoCon Europe 2014
makinacorpus
3
840
Petit déjeuner "Les bases de la cartographie sur le Web"
makinacorpus
0
410
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir de rien" le 11 février - Toulouse
makinacorpus
0
260
CoDe, le programme de développement d'applications mobiles de Makina Corpus
makinacorpus
0
100
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes - Sylvain Beorchia
makinacorpus
0
650
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
makinacorpus
1
380
Tests carto avec Mocha
makinacorpus
0
800
Other Decks in Technology
See All in Technology
開発組織のための セキュアコーディング研修の始め方
flatt_security
3
2.4k
開発スピードは上がっている…品質はどうする? スピードと品質を両立させるためのプロダクト開発の進め方とは #DevSumi #DevSumiB / Agile And Quality
nihonbuson
2
2.9k
プロダクトエンジニア構想を立ち上げ、プロダクト志向な組織への成長を続けている話 / grow into a product-oriented organization
hiro_torii
1
200
「海外登壇」という 選択肢を与えるために 〜Gophers EX
logica0419
0
710
『衛星データ利用の方々にとって近いようで触れる機会のなさそうな小話 ~ 衛星搭載ソフトウェアと衛星運用ソフトウェア (実物) を動かしながらわいわいする編 ~』 @日本衛星データコミニティ勉強会
meltingrabbit
0
150
リアルタイム分析データベースで実現する SQLベースのオブザーバビリティ
mikimatsumoto
0
1.4k
技術的負債解消の取り組みと専門チームのお話 #技術的負債_Findy
bengo4com
1
1.3k
Data-centric AI入門第6章:Data-centric AIの実践例
x_ttyszk
1
410
Amazon S3 Tablesと外部分析基盤連携について / Amazon S3 Tables and External Data Analytics Platform
nttcom
0
130
Classmethod AI Talks(CATs) #16 司会進行スライド(2025.02.12) / classmethod-ai-talks-aka-cats_moderator-slides_vol16_2025-02-12
shinyaa31
0
110
抽象化をするということ - 具体と抽象の往復を身につける / Abstraction and concretization
soudai
16
6.5k
関東Kaggler会LT: 人狼コンペとLLM量子化について
nejumi
3
580
Featured
See All Featured
Become a Pro
speakerdeck
PRO
26
5.1k
Writing Fast Ruby
sferik
628
61k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Fireside Chat
paigeccino
34
3.2k
The Power of CSS Pseudo Elements
geoffreycrofte
75
5.5k
GraphQLの誤解/rethinking-graphql
sonatard
68
10k
Visualization
eitanlees
146
15k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
10
1.3k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.4k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Transcript
Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects • Plone contents are
objects, • we store them in the ZODB, • everything is fine, end of the story.
But what if ... ... we want to store non-contentish
records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
Store them as contents anyway That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary system •
you need to deploy it, • you need to backup it, • you need to secure it, • etc.
Problem 2: I hate SQL No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB? • Is
the ZODB strong enough? • Is the ZCatalog strong enough?
My grandmother often told me "If you want to become
stronger, you have to eat your soup."
Where do we find a good soup for Plone? In
a super souper!!!
souper.plone and souper • It provides both storage and indexing.
• Record can store any persistent pickable data. • Created by BlueDynamics. • Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record >>> soup = get_soup('mysoup', context) >>> record
= Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] =
'6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
Access record >>> from souper.soup import get_soup >>> soup =
get_soup('mysoup', context) >>> record = soup.get(record_id)
Query >>> from repoze.catalog.query import Eq, Contains >>> [r for
r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
souper • a Soup-container can be moved to a specific
ZODB mount- point, • it can be shared across multiple independent Plone instances, • souper works on Plone and Pyramid.
Plomino & souper • we use Plomino to build non-content
oriented apps easily, • we use souper to store huge amount of application data.
Plomino data storage Originally, documents (=record) were ATFolder. Capacity about
30 000.
Plomino data storage Since 1.14, documents are pure CMF. Capacity
about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper With souper, documents are just soup records.
Capacity: several millions.
Typical use case • Store 500 000 addresses, • Be
able to query them in full text and display the result on a map. Demo
What is the limit? Can we import Wikipedia in souper?
Demo with 400 000 records Demo with 5,5 millions of records
Conclusion • Usage performances are good, • Plone performances are
not impacted. Use it!
Thoughts • What about a REST API on top of
it? • Massive import is long and difficult, could it be improved?
Makina Corpus For all questions related to this talk, please
contact Éric Bréhault
[email protected]
Tel : +33 534 566 958 www.makina-corpus.com