Slide 1

Slide 1 text

Multi-table Full Text Search Postgres with

Slide 2

Slide 2 text

@calebthompson calebthompson.io

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

I'm going to talk to you about

Slide 6

Slide 6 text

A real-life feature.

Slide 7

Slide 7 text

Full Text Search

Slide 8

Slide 8 text

Multi-table Full Text Search Postgres with

Slide 9

Slide 9 text

Views

Slide 10

Slide 10 text

Performance

Slide 11

Slide 11 text

Materialized Views

Slide 12

Slide 12 text

Gems

Slide 13

Slide 13 text

Other Options

Slide 14

Slide 14 text

Let's search for Articles

Slide 15

Slide 15 text

Article.where("body LIKE %?%", query)

Slide 16

Slide 16 text

Exact substrings

Slide 17

Slide 17 text

but that's not very useful

Slide 18

Slide 18 text

Article.where("body ILIKE %?%", query)

Slide 19

Slide 19 text

Search on title

Slide 20

Slide 20 text

Article.where( "body ILIKE %?% OR title ILIKE %?%", query, query)

Slide 21

Slide 21 text

Search by Author's name

Slide 22

Slide 22 text

Scopes def self.search(query) joins(:user) .where(<<-SQL, query, query, query) articles.body ILIKE %?% OR articles.title ILIKE %?% OR users.name ILIKE %?% SQL end

Slide 23

Slide 23 text

Query object class Search def self.for(query) [ Article.where("title ILIKE %?%", query), Article.where("body ILIKE %?%", query), Article.joins(:user) .where("users.name ILIKE %?%", query), ].flatten.uniq end end

Slide 24

Slide 24 text

Poor results

Slide 25

Slide 25 text

Full Text Search Enter

Slide 26

Slide 26 text

natural language searching

Slide 27

Slide 27 text

remove stop words and, the, also, they, would

Slide 28

Slide 28 text

eliminate casing “Factory” and “factory” should return the same results

Slide 29

Slide 29 text

synonyms hunger and hungrily

Slide 30

Slide 30 text

stemming “try” and “trying”, “tries”, and “tried” will be recorded in the index under the single concept word "tri."

Slide 31

Slide 31 text

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?');

Slide 32

Slide 32 text

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id

Slide 33

Slide 33 text

id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id

Slide 34

Slide 34 text

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body A FROM articles UNION SELECT articles.id AS id, authors.name AS the_te

Slide 35

Slide 35 text

title || ' ' || body A FROM articles UNION SELECT articles.id AS id, authors.name AS the_te FROM authors JOIN articles ON authors.id = articles ) AS this_doesnt_matter WHERE

Slide 36

Slide 36 text

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS t FROM articles UNION SELECT

Slide 37

Slide 37 text

That's a lot of SQL.

Slide 38

Slide 38 text

We could throw that into our query object class Search def self.for(query) <<-SQL SQL end end

Slide 39

Slide 39 text

Scopes def self.search(query) where(<<-SQL, query) SQL end

Slide 40

Slide 40 text

Postgres has our answer

Slide 41

Slide 41 text

Views

Slide 42

Slide 42 text

Partial queries

Slide 43

Slide 43 text

Stored in the Database

Slide 44

Slide 44 text

Can be SELECTed from

Slide 45

Slide 45 text

Return set of columns

Slide 46

Slide 46 text

Multi- source

Slide 47

Slide 47 text

Complete the query

Slide 48

Slide 48 text

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days';

Slide 49

Slide 49 text

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days';

Slide 50

Slide 50 text

CREATE VIEW users_with_recent_activity SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days

Slide 51

Slide 51 text

CREATE VIEW users_with_recent AS SELECT DISTINCT ON (users.id) users.*, activities.created_at active_at FROM users JOIN activities ON activities.user_id = u

Slide 52

Slide 52 text

VIEW users_with_recent_activity AS DISTINCT ON (users.id) users.*, activities.created_at AS active_at users activities activities.user_id = users.id activities.created_at >= CURRENT_DATE - interval '7 days';

Slide 53

Slide 53 text

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days';

Slide 54

Slide 54 text

Querying a view looks just like querying a table SELECT * FROM users_with_recent_activity WHERE id IN (1,2,3…) ORDER BY active_at DESC

Slide 55

Slide 55 text

ActiveRecord can use a view as its backend

Slide 56

Slide 56 text

So we can create a fairly vanilla model

Slide 57

Slide 57 text

class UserWithRecentActivity \ < ActiveRecord::Base def self.table_name "users_with_recent_activity" end def readonly? true end end

Slide 58

Slide 58 text

ass UserWithRecentActivity \ < ActiveRecord::Base ef self.table_name "users_with_recent_activity" nd

Slide 59

Slide 59 text

class UserWithRecentActivity \ < ActiveRecord::Base def self.table_name "users_with_recent_activity" end def readonly? true end end

Slide 60

Slide 60 text

def self.table_name "users_with_recent_acti end def readonly? true end end

Slide 61

Slide 61 text

Will it work with full text search?

Slide 62

Slide 62 text

⚲Textacular

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

Takes care of the Full Text Search portions of queries

Slide 65

Slide 65 text

Search over every text field on a record

Slide 66

Slide 66 text

Variant search options like basic_search, fuzzy search, and advanced search

Slide 67

Slide 67 text

Game.basic_search('Sonic')

Slide 68

Slide 68 text

Game.basic_search( title: 'Mario', system: 'Nintendo' )

Slide 69

Slide 69 text

So let's go back and look at the search we wrote

Slide 70

Slide 70 text

SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id;

Slide 71

Slide 71 text

Now our search result is really simple in the Rails side

Slide 72

Slide 72 text

class Search < ActiveRecord::Base include Textacular belongs_to :article end

Slide 73

Slide 73 text

class Search < ActiveRecord::Base include Textacular belongs_to :article end

Slide 74

Slide 74 text

Search.basic_search("Sandi").map(&:article)

Slide 75

Slide 75 text

Search.basic_search("Sandi").map(&:article)

Slide 76

Slide 76 text

class SearchResult include Enumerable def initialize(query) @results = Search.basic_search(query) end def each @results.each end end

Slide 77

Slide 77 text

CREATE Migration

Slide 78

Slide 78 text

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up ActiveRecord::Base.connection.execute( "--The CREATE VIEW sql from before" ) end def down ActiveRecord::Base.connection.execute( 'DROP VIEW users_with_recent_activity' ) end end

Slide 79

Slide 79 text

How resistant to change is it?

Slide 80

Slide 80 text

Let's find out - time for some feature creep!

Slide 81

Slide 81 text

Articles whose comments match the query

Slide 82

Slide 82 text

Searching on • Article (title, body) • Author (name) • Comments (body)

Slide 83

Slide 83 text

SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION SELECT article_id, body AS the_text FROM comments

Slide 84

Slide 84 text

authors.name AS the_tex FROM authors JOIN articles ON authors.id = article UNION SELECT article_id, body AS the_text FROM comments

Slide 85

Slide 85 text

UPDATE Migration

Slide 86

Slide 86 text

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up ActiveRecord::Base.connection.execute( "CREATE OR REPLACE VIEW searches AS --The sql from the new view" ) end def down ActiveRecord::Base.connection.execute( "CREATE OR REPLACE VIEW searches AS --The sql from the old view" ) end end

Slide 87

Slide 87 text

UPDATE Migration

Slide 88

Slide 88 text

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up ActiveRecord::Base.connection.execute( "DROP VIEW searches; CREATE VIEW searches AS --The sql from the new view" ) end def down ActiveRecord::Base.connection.execute( "DROP VIEW searches; CREATE VIEW searches AS --The sql from the old view" ) end end

Slide 89

Slide 89 text

Can't be dumped to db/schema.rb

Slide 90

Slide 90 text

config.database_format = :structure db/structure.sql

Slide 91

Slide 91 text

Scenic

Slide 92

Slide 92 text

No content

Slide 93

Slide 93 text

[show how scenic gets migrations from view.sql definitions]

Slide 94

Slide 94 text

[show that it has versions]

Slide 95

Slide 95 text

Model generator $ rails generate scenic:model search create app/models/search.rb create db/views/searches_v01.sql create db/migrate/..._create_searches.rb

Slide 96

Slide 96 text

$ cat app/views/searches_v01.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id

Slide 97

Slide 97 text

View generator $ rails generate scenic:view search create db/views/searches_v02.sql create db/migrate/..._update_searches_to_version_2.rb

Slide 98

Slide 98 text

$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id

Slide 99

Slide 99 text

$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION SELECT article_id, body AS the_text FROM comments

Slide 100

Slide 100 text

Performance

Slide 101

Slide 101 text

This query is pretty slow

Slide 102

Slide 102 text

it has to search across three tables to get us all of the results we need

Slide 103

Slide 103 text

Full Text Search ~400 ms ILIKE ~40 ms Full Text Search vs ILIKE

Slide 104

Slide 104 text

Views can't be indexed

Slide 105

Slide 105 text

Underlying tables can

Slide 106

Slide 106 text

Add indices

Slide 107

Slide 107 text

There are several types of indices

Slide 108

Slide 108 text

btree

Slide 109

Slide 109 text

GIN & GIST

Slide 110

Slide 110 text

GIN index lookups are about three times faster than GiST

Slide 111

Slide 111 text

GIN indexes take about three times longer to build than GiST

Slide 112

Slide 112 text

GIN indexes are moderately slower to update than GiST indexes

Slide 113

Slide 113 text

GIN indexes are two-to-three times larger than GiST indexes

Slide 114

Slide 114 text

Use GIN • You don't have hundreds of thousands of rows • You're not concerned about longer writes blocking the db • You're adding the index late in the game • You don't care about disk space • You want fast lookups (read-heavy)

Slide 115

Slide 115 text

Use GiST • Very large tables • You have performance concerns • For some reason disk space is important • Your table is write-heavy

Slide 116

Slide 116 text

add_index :articles, :body, using: :gin add_index :articles, :title, using: :gin add_index :authors, :name, using: :gin add_index :comments, :body, using: :gin

Slide 117

Slide 117 text

Materialized Views

Slide 118

Slide 118 text

Pre-populate the results of the view

Slide 119

Slide 119 text

Query against result set

Slide 120

Slide 120 text

Full Text Search ~400 ms ILIKE ~40 ms Materialized ~ 5.394 ms

Slide 121

Slide 121 text

REFRESH MATERIALIZED VIEW searches

Slide 122

Slide 122 text

after_commit :refresh def refresh class.connection.execute( "REFRESH MATERIALIZED VIEW #{table_name}" ) end

Slide 123

Slide 123 text

What about pre-built solutions?

Slide 124

Slide 124 text

ElasticSearch via Tire or elasticsearch-rails or Chewy require 'elasticsearch/model' class Article < ActiveRecord::Base include Elasticsearch::Model include Elasticsearch::Model::Callbacks end

Slide 125

Slide 125 text

Solr via Sunspot class Post < ActiveRecord::Base searchable do text :title, :body text :comments do comments.map { |comment| comment.body } end boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end end end

Slide 126

Slide 126 text

text :title, :body text :comments do comments.map {|comment| comment.body} end

Slide 127

Slide 127 text

boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at

Slide 128

Slide 128 text

string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end

Slide 129

Slide 129 text

sphinx with ThinkingSphinx ThinkingSphinx::Index.define :article, :with => :active_record do indexes subject, :sortable => true indexes content indexes author.name, :as => :author, :sortable => true has author_id, created_at, updated_at end

Slide 130

Slide 130 text

These services excel at faceted search

Slide 131

Slide 131 text

These services excel at faceted search More difficult with full text search

Slide 132

Slide 132 text

Run on your development machine.

Slide 133

Slide 133 text

Run on your production machine.

Slide 134

Slide 134 text

Needs to be faked in tests

Slide 135

Slide 135 text

Some of these have lots of cruft in models.

Slide 136

Slide 136 text

Remove a data concern from your database

Slide 137

Slide 137 text

Arcane syntax

Slide 138

Slide 138 text

:(

Slide 139

Slide 139 text

By combining

Slide 140

Slide 140 text

materialized views

Slide 141

Slide 141 text

full text search

Slide 142

Slide 142 text

Rails magic ✨

Slide 143

Slide 143 text

we have a pretty cool search feature

Slide 144

Slide 144 text

that doesn't require any new dependencies

Slide 145

Slide 145 text

:)

Slide 146

Slide 146 text

Thank you

Slide 147

Slide 147 text

Bibliography • http://shisaa.jp/postset/postgresql-full-text-search-part-1.html • http://blog.lostpropertyhq.com/postgres-full-text-search-is-good- enough/ • http://www.postgresql.org/docs/9.3/static/textsearch.html • http://linuxgazette.net/164/sephton.html • http://www.postgresql.org/docs/current/static/sql-createview.html • http://www.postgresql.org/docs/current/static/indexes-examine.html • http://www.postgresql.org/docs/current/static/textsearch-indexes.html • http://en.wikipedia.org/wiki/Gin_and_tonic

Slide 148

Slide 148 text

No content

Slide 149

Slide 149 text

No content

Slide 150

Slide 150 text

http://calebthompson.io/talks/search.html

Slide 151

Slide 151 text

Come say hi That’s why I do this