Slide 1

Slide 1 text

Postgres Full Text Search in Your Rails App

Slide 2

Slide 2 text

• Why, Pros and Cons • A bit of Postgres • Under the Hood • PgSearch • Search Scopes • Multi search • Performance, Indexes, etc

Slide 3

Slide 3 text

Why?

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

https://github.com/ PublicLibraryAssoc/cpldl

Slide 6

Slide 6 text

Wanted:

Slide 7

Slide 7 text

Wanted: Full Text Search without external programs

Slide 8

Slide 8 text

Wanted: Full Text Search without external programs No additional server infrastructure

Slide 9

Slide 9 text

Wanted: Full Text Search without external programs No additional server infrastructure Works with AWS RDS

Slide 10

Slide 10 text

A bit of Postgres:

Slide 11

Slide 11 text

A bit of Postgres: Evolved from Ingres

Slide 12

Slide 12 text

A bit of Postgres: Evolved from Ingres Open sourced in 1995

Slide 13

Slide 13 text

Under the Hood: Two key data types

Slide 14

Slide 14 text

Under the Hood: Two key data types ts_vector

Slide 15

Slide 15 text

to_tsvector(config, text) outputs a normalized tsvector

Slide 16

Slide 16 text

SELECT to_tsvector(‘english’, ‘Hello Boulder Ruby’) -> boulder,hello,rubi

Slide 17

Slide 17 text

SELECT to_tsvector(‘english’, ‘Hello Boulder Ruby’) -> boulder, hello, rubi W AT.

Slide 18

Slide 18 text

Under the Hood: Two key data types ts_vector ts_query

Slide 19

Slide 19 text

SELECT to_tsquery(‘english’, ‘ruby’) -> rubi

Slide 20

Slide 20 text

Operators: @@ vector @@ query && || vector @@ (A && B) vector @@ (A || B)

Slide 21

Slide 21 text

SELECT body FROM comments WHERE to_tsvector (‘english, body) @@ ( to_tsquery(‘english’, ‘ruby’) && to_tsquery(‘english’, ‘rails’) );

Slide 22

Slide 22 text

A Word of Warning: Never pass user input directly to ts_query.

Slide 23

Slide 23 text

DIY: class Comment < ActiveRecord::Base scope :search_all -> (query) { where (“to_tsvector(‘english’, body) @@ #{sanitize_query(query)}”) } Comment.search_all(“ruby”)

Slide 24

Slide 24 text

pg_search

Slide 25

Slide 25 text

pg_search Developed by Pivotal’s Casebook dev team. Inspired by Aaron Patterson’s Textactular

Slide 26

Slide 26 text

Install: gem ‘pg_search’ include PgSearch

Slide 27

Slide 27 text

Pro-Tip: config/application.rb config.active_record.schema_format = :sql

Slide 28

Slide 28 text

Building a Search Scope pg_search_scope :search_comments, against: :body pg_search_scope :search_all, against: [:body, :title]

Slide 29

Slide 29 text

Dynamic Search Scope class Person < ActiveRecord::Base include PgSearch pg_search_scope :search_by_name, lambda do |name_part, query| raise ArgumentError unless [:first, :last].include?(name_part) { :against => name_part, :query => query } end end person_1 = Person.create!(:first_name => “Grant", :last_name => "Hill") person_2 = Person.create!(:first_name => “Hugh", :last_name => "Grant") Person.search_by_name :first, "Grant" # => [person_1] Person.search_by_name :last, "Grant" # => [person_2]

Slide 30

Slide 30 text

Searching Through Associations Quickly implement cross-model searching

Slide 31

Slide 31 text

Searching Through Associations Quickly implement cross-model searching Heads up • Lookup speed can’t be improved via database indexes.

Slide 32

Slide 32 text

Searching Through Associations pg_search_scope :search_users, against: [:email], associated_against: { profile: [:first_name], roles: [:name] }, using: { tsearch: { prefix: true }, dmetaphone: { any_word: true }, trigram: { threshold: 0.1 } } https://github.com/PublicLibraryAssoc/cpldl/blob/master/app/models/user.rb#L31-L36

Slide 33

Slide 33 text

Additional pg_search features • :tsearch • :trigram • :dmetaphone

Slide 34

Slide 34 text

Additional pg_search features def up execute 'CREATE EXTENSION pg_trgm;' execute 'CREATE EXTENSION fuzzystrmatch;' end

Slide 35

Slide 35 text

:tsearch • The standard full text search implementation

Slide 36

Slide 36 text

:tsearch • The standard full text search implementation • Supports Weighting, Prefix searches, and Stemming

Slide 37

Slide 37 text

:tsearch • Weighting Allows you to assign a rank of A, B, C, or D A -> D is High -> Low importance. pg_search_scope :search_full_text, against: { title: 'A', subtitle: 'B', content: 'C' }

Slide 38

Slide 38 text

:tsearch • Prefix Allows you to match on partial terms pg_search_scope :whose_name_starts_with, against: :name, using: { tsearch: { prefix: true } }

Slide 39

Slide 39 text

:tsearch • Prefix Allows you to match on partial terms pg_search_scope :whose_name_starts_with, against: :name, using: { tsearch: { prefix: true } }

Slide 40

Slide 40 text

:tsearch • Dictionaries and Stemming Stemming removes common endings, like -ing, -y, and -er. Normalizes the search lexemes.

Slide 41

Slide 41 text

:tsearch • Dictionaries and Stemming pg_search_scope :search_text, against: :text, using: { tsearch: { dictionary: “english” } }

Slide 42

Slide 42 text

:tsearch • :any_word pg_search_scope :search_users, against: [:email], associated_against: { profile: [:first_name], roles: [:name] }, using: { dmetaphone: { any_word: true }, }

Slide 43

Slide 43 text

:dmetaphone • Install the contrib package def up execute 'CREATE EXTENSION fuzzystrmatch;' end • Add the migration rails g pg_search:migration:dmetaphone • Matches “Geoff” and “Jeff”

Slide 44

Slide 44 text

:trigram • Install the contrib package def up execute 'CREATE EXTENSION pg_trgm;' end • Creates three letter substrings. Boulder Ruby -> [“ Bo”, “Bou”, “oul”, “uld”, “lde”, “der”, “er “, “r R”, “ Ru”, “Rub”, “uby”, “by “, “y “]

Slide 45

Slide 45 text

:trigram • :threshold pg_search_scope :search_users, against: [:email], associated_against: { profile: [:first_name], roles: [:name] }, using: { trigram: { threshold: 0.1 } }

Slide 46

Slide 46 text

Limiting features on Fields • :only pg_search_scope :search_users, against: [:email], associated_against: { profile: [:first_name], roles: [:name] }, using: { trigram: only: [:email], threshold: 0.5 } }

Slide 47

Slide 47 text

Unaccenting • Add migration for contrib package def up execute 'CREATE EXTENSION unaccent;' end • Add option pg_search_scope :gringo_search, against: :word, ignoring: :accents

Slide 48

Slide 48 text

Unaccenting Heads Up • Requires Postgres 9.0 or newer • Prevents indexing Alternatives • Use Trigram + GIN index • Using an SQL incantation to wrap or replace the function and make it immutable http://stackoverflow.com/questions/11005036/does-postgresql-support-accent-insensitive-collations/11007216#11007216

Slide 49

Slide 49 text

Multi-document Search • Add migration to enable pg_search_documents table rails g pg_search:migration:multisearch • Add multisearchable to your model class Course < ActiveRecord::Base multisearchable against: [:title, :summary, :description, :topics_str, :level]

Slide 50

Slide 50 text

Multi-document Search • Indexing existing data PgSearch::Multisearch.rebuild(User) or a rake task rake pg_search:multisearch:rebuild[User]

Slide 51

Slide 51 text

Multi-document Search • Multi searchable Associations User has_one :pg_search_document PgSearchDocument belongs_to :searchable, polymorphic: true

Slide 52

Slide 52 text

Multi-document Search • Searching and Finding Records PgSearch.multisearchable(“ruby”) => Returns a ActiveRecord::Relation of PgSearch::Document => Can call .searchable on each PgSearch::Document to get the original resource. • Configuration is the same as pg_search_scope

Slide 53

Slide 53 text

Performance and Indexing • You can create a “tsvector” type column to store lexemes. • One tsvector column is needed for each searching method. • You’ll also need to create a trigger function to update the tsvector column when content changes in the original model object. • SQL schema format required.

Slide 54

Slide 54 text

Performance and Indexing pg_search_scope :fast_content_search, against: :content, using: { dmetaphone: { tsvector_column: 'tsvector_content_dmetaphone' }, tsearch: { dictionary: 'english', tsvector_column: 'tsvector_content_tsearch' } trigram: {} # trigram does not use tsvectors } https://robots.thoughtbot.com/optimizing-full-text-search-with-postgres-tsvector-columns-and-triggers

Slide 55

Slide 55 text

GIN vs GiST Index GIN • Generalized Inverted Index • Must be “tsvector” data type GiST • Generalized Search Tree • Can be “tsvector” or “tsquery” data type

Slide 56

Slide 56 text

GIN vs GiST Index GIN • ~3x faster lookup than GiST • ~2-3x larger than GiST GiST • ~3x faster for initial build than GIN • Moderately faster to update than GIN

Slide 57

Slide 57 text

GIN vs GiST Index GIN • Best for static data • Better for >100,000 unique lexemes GiST • Faster to update, so good for highly dynamic data • Better for <100,000 unique lexemes

Slide 58

Slide 58 text

Thank You! @shepbook [email protected] Special Thanks: • Grant Hutchins, Pivotal and the pg_search contributors • Adam Sanderson “Postgres: The best tool you’re already using.” (Talk on Confreaks.) • Dan Croak, Thoughtbot “Optimizing Full Text Search with Postgres tsvector Columns and Triggers” (Blog Post)