Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres Full Text Search with pg_search

Postgres Full Text Search with pg_search

A high level overview of using Posgtres for full text search using the pg_search gem.

Jared Koumentis

December 16, 2015
Tweet

More Decks by Jared Koumentis

Other Decks in Programming

Transcript

  1. • Why, Pros and Cons • A bit of Postgres

    • Under the Hood • PgSearch • Search Scopes • Multi search • Performance, Indexes, etc
  2. SELECT body FROM comments WHERE to_tsvector (‘english, body) @@ (

    to_tsquery(‘english’, ‘ruby’) && to_tsquery(‘english’, ‘rails’) );
  3. DIY: class Comment < ActiveRecord::Base scope :search_all -> (query) {

    where (“to_tsvector(‘english’, body) @@ #{sanitize_query(query)}”) } Comment.search_all(“ruby”)
  4. Dynamic Search Scope class Person < ActiveRecord::Base include PgSearch pg_search_scope

    :search_by_name, lambda do |name_part, query| raise ArgumentError unless [:first, :last].include?(name_part) { :against => name_part, :query => query } end end person_1 = Person.create!(:first_name => “Grant", :last_name => "Hill") person_2 = Person.create!(:first_name => “Hugh", :last_name => "Grant") Person.search_by_name :first, "Grant" # => [person_1] Person.search_by_name :last, "Grant" # => [person_2]
  5. Searching Through Associations Quickly implement cross-model searching Heads up •

    Lookup speed can’t be improved via database indexes.
  6. Searching Through Associations pg_search_scope :search_users, against: [:email], associated_against: { profile:

    [:first_name], roles: [:name] }, using: { tsearch: { prefix: true }, dmetaphone: { any_word: true }, trigram: { threshold: 0.1 } } https://github.com/PublicLibraryAssoc/cpldl/blob/master/app/models/user.rb#L31-L36
  7. :tsearch • Weighting Allows you to assign a rank of

    A, B, C, or D A -> D is High -> Low importance. pg_search_scope :search_full_text, against: { title: 'A', subtitle: 'B', content: 'C' }
  8. :tsearch • Prefix Allows you to match on partial terms

    pg_search_scope :whose_name_starts_with, against: :name, using: { tsearch: { prefix: true } }
  9. :tsearch • Prefix Allows you to match on partial terms

    pg_search_scope :whose_name_starts_with, against: :name, using: { tsearch: { prefix: true } }
  10. :tsearch • Dictionaries and Stemming Stemming removes common endings, like

    -ing, -y, and -er. Normalizes the search lexemes.
  11. :tsearch • :any_word pg_search_scope :search_users, against: [:email], associated_against: { profile:

    [:first_name], roles: [:name] }, using: { dmetaphone: { any_word: true }, }
  12. :dmetaphone • Install the contrib package def up execute 'CREATE

    EXTENSION fuzzystrmatch;' end • Add the migration rails g pg_search:migration:dmetaphone • Matches “Geoff” and “Jeff”
  13. :trigram • Install the contrib package def up execute 'CREATE

    EXTENSION pg_trgm;' end • Creates three letter substrings. Boulder Ruby -> [“ Bo”, “Bou”, “oul”, “uld”, “lde”, “der”, “er “, “r R”, “ Ru”, “Rub”, “uby”, “by “, “y “]
  14. :trigram • :threshold pg_search_scope :search_users, against: [:email], associated_against: { profile:

    [:first_name], roles: [:name] }, using: { trigram: { threshold: 0.1 } }
  15. Limiting features on Fields • :only pg_search_scope :search_users, against: [:email],

    associated_against: { profile: [:first_name], roles: [:name] }, using: { trigram: only: [:email], threshold: 0.5 } }
  16. Unaccenting • Add migration for contrib package def up execute

    'CREATE EXTENSION unaccent;' end • Add option pg_search_scope :gringo_search, against: :word, ignoring: :accents
  17. Unaccenting Heads Up • Requires Postgres 9.0 or newer •

    Prevents indexing Alternatives • Use Trigram + GIN index • Using an SQL incantation to wrap or replace the function and make it immutable http://stackoverflow.com/questions/11005036/does-postgresql-support-accent-insensitive-collations/11007216#11007216
  18. Multi-document Search • Add migration to enable pg_search_documents table rails

    g pg_search:migration:multisearch • Add multisearchable to your model class Course < ActiveRecord::Base multisearchable against: [:title, :summary, :description, :topics_str, :level]
  19. Multi-document Search • Searching and Finding Records PgSearch.multisearchable(“ruby”) => Returns

    a ActiveRecord::Relation of PgSearch::Document => Can call .searchable on each PgSearch::Document to get the original resource. • Configuration is the same as pg_search_scope
  20. Performance and Indexing • You can create a “tsvector” type

    column to store lexemes. • One tsvector column is needed for each searching method. • You’ll also need to create a trigger function to update the tsvector column when content changes in the original model object. • SQL schema format required.
  21. Performance and Indexing pg_search_scope :fast_content_search, against: :content, using: { dmetaphone:

    { tsvector_column: 'tsvector_content_dmetaphone' }, tsearch: { dictionary: 'english', tsvector_column: 'tsvector_content_tsearch' } trigram: {} # trigram does not use tsvectors } https://robots.thoughtbot.com/optimizing-full-text-search-with-postgres-tsvector-columns-and-triggers
  22. GIN vs GiST Index GIN • Generalized Inverted Index •

    Must be “tsvector” data type GiST • Generalized Search Tree • Can be “tsvector” or “tsquery” data type
  23. GIN vs GiST Index GIN • ~3x faster lookup than

    GiST • ~2-3x larger than GiST GiST • ~3x faster for initial build than GIN • Moderately faster to update than GIN
  24. GIN vs GiST Index GIN • Best for static data

    • Better for >100,000 unique lexemes GiST • Faster to update, so good for highly dynamic data • Better for <100,000 unique lexemes
  25. Thank You! @shepbook [email protected] Special Thanks: • Grant Hutchins, Pivotal

    and the pg_search contributors • Adam Sanderson “Postgres: The best tool you’re already using.” (Talk on Confreaks.) • Dan Croak, Thoughtbot “Optimizing Full Text Search with Postgres tsvector Columns and Triggers” (Blog Post)