Pro Yearly is on sale from $80 to $50! »

Multi-table Full Text Search in Postgres

A438eb5b27da0f50dc120f9bfbdd9c16?s=47 Caleb Hearth
October 14, 2014

Multi-table Full Text Search in Postgres

A438eb5b27da0f50dc120f9bfbdd9c16?s=128

Caleb Hearth

October 14, 2014
Tweet

Transcript

  1. Multi-table Full Text Search Postgres with

  2. @calebthompson calebthompson.io

  3. None
  4. None
  5. I'm going to talk to you about

  6. A real-life feature.

  7. Full Text Search

  8. Multi-table Full Text Search Postgres with

  9. Views

  10. Performance

  11. Materialized Views

  12. Gems

  13. Other Options

  14. Let's search for Articles

  15. Article.where("body LIKE %?%", query)

  16. Exact substrings

  17. but that's not very useful

  18. Article.where("body ILIKE %?%", query)

  19. Search on title

  20. Article.where( "body ILIKE %?% OR title ILIKE %?%", query, query)

  21. Search by Author's name

  22. Scopes def self.search(query) joins(:user) .where(<<-SQL, query, query, query) articles.body ILIKE

    %?% OR articles.title ILIKE %?% OR users.name ILIKE %?% SQL end
  23. Query object class Search def self.for(query) [ Article.where("title ILIKE %?%",

    query), Article.where("body ILIKE %?%", query), Article.joins(:user) .where("users.name ILIKE %?%", query), ].flatten.uniq end end
  24. Poor results

  25. Full Text Search Enter

  26. natural language searching

  27. remove stop words and, the, also, they, would

  28. eliminate casing “Factory” and “factory” should return the same results

  29. synonyms hunger and hungrily

  30. stemming “try” and “trying”, “tries”, and “tried” will be recorded

    in the index under the single concept word "tri."
  31. SELECT DISTINCT(id) FROM ( SELECT id AS id, title ||

    ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?');
  32. SELECT DISTINCT(id) FROM ( SELECT id AS id, title ||

    ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id
  33. id AS id, title || ' ' || body AS

    the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id
  34. SELECT DISTINCT(id) FROM ( SELECT id AS id, title ||

    ' ' || body A FROM articles UNION SELECT articles.id AS id, authors.name AS the_te
  35. title || ' ' || body A FROM articles UNION

    SELECT articles.id AS id, authors.name AS the_te FROM authors JOIN articles ON authors.id = articles ) AS this_doesnt_matter WHERE
  36. SELECT DISTINCT(id) FROM ( SELECT id AS id, title ||

    ' ' || body AS t FROM articles UNION SELECT
  37. That's a lot of SQL.

  38. We could throw that into our query object class Search

    def self.for(query) <<-SQL SQL end end
  39. Scopes def self.search(query) where(<<-SQL, query) SQL end

  40. Postgres has our answer

  41. Views

  42. Partial queries

  43. Stored in the Database

  44. Can be SELECTed from

  45. Return set of columns

  46. Multi- source

  47. Complete the query

  48. CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at

    AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days';
  49. CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at

    AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days';
  50. CREATE VIEW users_with_recent_activity SELECT DISTINCT ON (users.id) users.*, activities.created_at AS

    active FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days
  51. CREATE VIEW users_with_recent AS SELECT DISTINCT ON (users.id) users.*, activities.created_at

    active_at FROM users JOIN activities ON activities.user_id = u
  52. VIEW users_with_recent_activity AS DISTINCT ON (users.id) users.*, activities.created_at AS active_at

    users activities activities.user_id = users.id activities.created_at >= CURRENT_DATE - interval '7 days';
  53. CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at

    AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days';
  54. Querying a view looks just like querying a table SELECT

    * FROM users_with_recent_activity WHERE id IN (1,2,3…) ORDER BY active_at DESC
  55. ActiveRecord can use a view as its backend

  56. So we can create a fairly vanilla model

  57. class UserWithRecentActivity \ < ActiveRecord::Base def self.table_name "users_with_recent_activity" end def

    readonly? true end end
  58. ass UserWithRecentActivity \ < ActiveRecord::Base ef self.table_name "users_with_recent_activity" nd

  59. class UserWithRecentActivity \ < ActiveRecord::Base def self.table_name "users_with_recent_activity" end def

    readonly? true end end
  60. def self.table_name "users_with_recent_acti end def readonly? true end end

  61. Will it work with full text search?

  62. ⚲Textacular

  63. None
  64. Takes care of the Full Text Search portions of queries

  65. Search over every text field on a record

  66. Variant search options like basic_search, fuzzy search, and advanced search

  67. Game.basic_search('Sonic')

  68. Game.basic_search( title: 'Mario', system: 'Nintendo' )

  69. So let's go back and look at the search we

    wrote
  70. SELECT id AS article_id, title || ' ' || body

    AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id;
  71. Now our search result is really simple in the Rails

    side
  72. class Search < ActiveRecord::Base include Textacular belongs_to :article end

  73. class Search < ActiveRecord::Base include Textacular belongs_to :article end

  74. Search.basic_search("Sandi").map(&:article)

  75. Search.basic_search("Sandi").map(&:article)

  76. class SearchResult include Enumerable def initialize(query) @results = Search.basic_search(query) end

    def each @results.each end end
  77. CREATE Migration

  78. class CreateUsersWithRecentActivity < ActiveRecord::Migration def up ActiveRecord::Base.connection.execute( "--The CREATE VIEW

    sql from before" ) end def down ActiveRecord::Base.connection.execute( 'DROP VIEW users_with_recent_activity' ) end end
  79. How resistant to change is it?

  80. Let's find out - time for some feature creep!

  81. Articles whose comments match the query

  82. Searching on • Article (title, body) • Author (name) •

    Comments (body)
  83. SELECT id AS article_id, title || ' ' || body

    AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION SELECT article_id, body AS the_text FROM comments
  84. authors.name AS the_tex FROM authors JOIN articles ON authors.id =

    article UNION SELECT article_id, body AS the_text FROM comments
  85. UPDATE Migration

  86. class CreateUsersWithRecentActivity < ActiveRecord::Migration def up ActiveRecord::Base.connection.execute( "CREATE OR REPLACE

    VIEW searches AS --The sql from the new view" ) end def down ActiveRecord::Base.connection.execute( "CREATE OR REPLACE VIEW searches AS --The sql from the old view" ) end end
  87. UPDATE Migration

  88. class CreateUsersWithRecentActivity < ActiveRecord::Migration def up ActiveRecord::Base.connection.execute( "DROP VIEW searches;

    CREATE VIEW searches AS --The sql from the new view" ) end def down ActiveRecord::Base.connection.execute( "DROP VIEW searches; CREATE VIEW searches AS --The sql from the old view" ) end end
  89. Can't be dumped to db/schema.rb

  90. config.database_format = :structure db/structure.sql

  91. Scenic

  92. None
  93. [show how scenic gets migrations from view.sql definitions]

  94. [show that it has versions]

  95. Model generator $ rails generate scenic:model search create app/models/search.rb create

    db/views/searches_v01.sql create db/migrate/..._create_searches.rb
  96. $ cat app/views/searches_v01.sql SELECT id AS article_id, title || '

    ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id
  97. View generator $ rails generate scenic:view search create db/views/searches_v02.sql create

    db/migrate/..._update_searches_to_version_2.rb
  98. $ cat app/views/searches_v02.sql SELECT id AS article_id, title || '

    ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id
  99. $ cat app/views/searches_v02.sql SELECT id AS article_id, title || '

    ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION SELECT article_id, body AS the_text FROM comments
  100. Performance

  101. This query is pretty slow

  102. it has to search across three tables to get us

    all of the results we need
  103. Full Text Search ~400 ms ILIKE ~40 ms Full Text

    Search vs ILIKE
  104. Views can't be indexed

  105. Underlying tables can

  106. Add indices

  107. There are several types of indices

  108. btree

  109. GIN & GIST

  110. GIN index lookups are about three times faster than GiST

  111. GIN indexes take about three times longer to build than

    GiST
  112. GIN indexes are moderately slower to update than GiST indexes

  113. GIN indexes are two-to-three times larger than GiST indexes

  114. Use GIN • You don't have hundreds of thousands of

    rows • You're not concerned about longer writes blocking the db • You're adding the index late in the game • You don't care about disk space • You want fast lookups (read-heavy)
  115. Use GiST • Very large tables • You have performance

    concerns • For some reason disk space is important • Your table is write-heavy
  116. add_index :articles, :body, using: :gin add_index :articles, :title, using: :gin

    add_index :authors, :name, using: :gin add_index :comments, :body, using: :gin
  117. Materialized Views

  118. Pre-populate the results of the view

  119. Query against result set

  120. Full Text Search ~400 ms ILIKE ~40 ms Materialized ~

    5.394 ms
  121. REFRESH MATERIALIZED VIEW searches

  122. after_commit :refresh def refresh class.connection.execute( "REFRESH MATERIALIZED VIEW #{table_name}" )

    end
  123. What about pre-built solutions?

  124. ElasticSearch via Tire or elasticsearch-rails or Chewy require 'elasticsearch/model' class

    Article < ActiveRecord::Base include Elasticsearch::Model include Elasticsearch::Model::Callbacks end
  125. Solr via Sunspot class Post < ActiveRecord::Base searchable do text

    :title, :body text :comments do comments.map { |comment| comment.body } end boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end end end
  126. text :title, :body text :comments do comments.map {|comment| comment.body} end

  127. boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple =>

    true double :average_rating time :published_at time :expired_at
  128. string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end

  129. sphinx with ThinkingSphinx ThinkingSphinx::Index.define :article, :with => :active_record do indexes

    subject, :sortable => true indexes content indexes author.name, :as => :author, :sortable => true has author_id, created_at, updated_at end
  130. These services excel at faceted search

  131. These services excel at faceted search More difficult with full

    text search
  132. Run on your development machine.

  133. Run on your production machine.

  134. Needs to be faked in tests

  135. Some of these have lots of cruft in models.

  136. Remove a data concern from your database

  137. Arcane syntax

  138. :(

  139. By combining

  140. materialized views

  141. full text search

  142. Rails magic ✨

  143. we have a pretty cool search feature

  144. that doesn't require any new dependencies

  145. :)

  146. Thank you

  147. Bibliography • http://shisaa.jp/postset/postgresql-full-text-search-part-1.html • http://blog.lostpropertyhq.com/postgres-full-text-search-is-good- enough/ • http://www.postgresql.org/docs/9.3/static/textsearch.html • http://linuxgazette.net/164/sephton.html

    • http://www.postgresql.org/docs/current/static/sql-createview.html • http://www.postgresql.org/docs/current/static/indexes-examine.html • http://www.postgresql.org/docs/current/static/textsearch-indexes.html • http://en.wikipedia.org/wiki/Gin_and_tonic
  148. None
  149. None
  150. http://calebthompson.io/talks/search.html

  151. Come say hi That’s why I do this