Slide 1

Slide 1 text

Florence 20 April 2018 PAOLO MELCHIORRE Full-text Search with PostgreSQL DjangoProject.com

Slide 2

Slide 2 text

2 Paolo Melchiorre Computer Science Engineer Python Developer ~ 2006 Django Developer ~ 2011 Senior Developer @ 20Tab Remote Worker PostgreSQL user (not DBA)

Slide 3

Slide 3 text

3 Outline ● Basic Search ● Full-text search ● Django & PostgreSQL ● Community ● Search in djangoproject.com

Slide 4

Slide 4 text

4 Goal ● Full-text Search ● PostgreSQL ● Django ● Add/Update search ● FLOSS Contribution

Slide 5

Slide 5 text

5 Search Experiences ● SQL & RDBMS ● Plone & ZODB ● SQLAlchemy ORM ● Django ORM ● Psycopg

Slide 6

Slide 6 text

6 Full-text Search Experiences ● Elastic Search ● Apache SOLR ● Apache Lucene ● Pros ● Cons

Slide 7

Slide 7 text

7 Full-text Search “… techniques for Searching … computer-stored Document … … in a Full-text Database …” -- Wikipedia

Slide 8

Slide 8 text

8 Document “… the Unit of searching in a Full-text Search System; for example, a magazine Article …” -- PostgreSQL

Slide 9

Slide 9 text

9 PostgreSQL Full-text Search ● 10-year Support ● Dedicated Data Types ● Special Indexes ● Phrase Search ● JSON[b] support

Slide 10

Slide 10 text

10 Full-text Search in Django ● django.contrib.postgres ● Since Django 1.10 ● BRIN and GIN indexes ● Out of the box feature ● Quite complete solution

Slide 11

Slide 11 text

11 ConcertiaRoma.com Search ● 15.000 Bands ● 20.000 Events ● 1.000 Venues ● Django 2.0 ● Trigram Similarity ● Full-text Search

Slide 12

Slide 12 text

12 Full-text Search Talks ● PyCon Otto ● EuroPython 2017 ● PGDay.it 2017 ● PyRoma Meetup ● paulox.net

Slide 13

Slide 13 text

13 DjangoProject.com Search ● Elastic-based Search ● English-only search ● HTML tag results ● Complex local environment ● Rewrite proposal

Slide 14

Slide 14 text

14 Django Developers Feedback CONS ● Work ● Features ● Hosting ● Database load PROS ● No ext service ● Maintenance ● Setup ● Dogfooding

Slide 15

Slide 15 text

15 DjangoProject.com Code ● Django 1.11 ● PostgreSQL 9.5 ● Sphinx 1.6 ● Elasticsearch 5.0 ● Not self-contained

Slide 16

Slide 16 text

16 EuroPython 2017 Sprints ● Environment test ● Bug reporting ● Elastic removal ● Full-text search ● Working but slow

Slide 17

Slide 17 text

17 Sprint Code Rewrite ● Commands restored ● Search Vector Field ● GIN Index ● Trigram similarity ● JSON Field

Slide 18

Slide 18 text

18 Pull Request and Deploy ● Pull Request #809 ● Tim and Florian review ● Tests additions ● Migrations squash ● Re-index Transaction ● Commits clean-up

Slide 19

Slide 19 text

19 DjangoProject.com ● Exact search ● Strip html tag ● Self-contained ● Quite fast ● 7 versions ● 10 languages

Slide 20

Slide 20 text

20 Multilingual search ● 15 languages ● Default simple ● Config field ● Query optimizations ● Specific tests

Slide 21

Slide 21 text

21 Pull Request & Django 2.0 ● Pull Request #813 ● Django 2.0 required ● Pull Request #794 ● Blocked by Django-Push ● Pull Request #24

Slide 22

Slide 22 text

22 What’s next ● Misspelling support ● Search suggestions ● Highlighting Results ● Searches statistic ● PostgreSQL 10

Slide 23

Slide 23 text

23 Conclusions ● Simplified Infrastructure ● Sped up Update ● Features preservation ● Search improvements ● Django/PostgreSQL only

Slide 24

Slide 24 text

24 Full-text Search To-Do list 1) PostgreSQL/Django Versions 2) Languages requested 3) Document to Search Vector 4) Query to Search Query 5) Search Vector Field & Indexes

Slide 25

Slide 25 text

25 Acknowledgements Tim Graham github.com/timgraham twentytab www.20tab.com

Slide 26

Slide 26 text

26 Resources  en.wikipedia.org/wiki/full-text_search  postgresql.org/docs/current/static/textsearch.html dj docs.djangoproject.com/en/dev/topics/db/search  stackoverflow.com/tags/full-text-search  github.com/django  git.postgresql.org

Slide 27

Slide 27 text

27 Thank you  License ( BY SA)   creativecommons.org/licenses/by-sa/4.0/  Source Code github.com/django/djangoproject.com  Slides speakerdeck.com/pauloxnet

Slide 28

Slide 28 text

28 Questions  stackoverflow.com/u/755343

Slide 29

Slide 29 text

29 Contacts  www.paulox.net  twitter.com/pauloxnet  linkedin.com/in/paolomelchiorre  github.com/pauloxnet