Slide 1

Slide 1 text

Postgres The Best Tool You're Already Using •  Adam Sanderson •  LiquidPlanner 1

Slide 2

Slide 2 text

Adam Sanderson I have been a full stack engineer at LiquidPlanner for 5 years. •  I got off in Kansas*, and that's ok! •  Github: adamsanderson •  Twitter: adamsanderson •  Blog: http://monkeyandcrow.com * Seattle 2

Slide 3

Slide 3 text

Online project management with probabilistic scheduling. •  Started in 2007 with Rails 1.x •  Used Postgres from the beginning •  We have learned some great techniques along the way 3

Slide 4

Slide 4 text

Topics •  Tagging •  Hierarchy •  Custom Data •  Full Text Search 4

Slide 5

Slide 5 text

Method For each topic, we'll cover the SQL before we cover its use in ActiveRecord. We will use Postgres 9.x, Ruby 1.9 syntax, and ActiveRecord 4.0. If you understand the SQL you can use it in any version of ActiveRecord, 4.0 just makes it easier. 5

Slide 6

Slide 6 text

Backstory You just built a great new social network for hedgehog lovers around the world, HedgeWith.me. Everything is going well. You have a few users, but now they want more. 6

Slide 7

Slide 7 text

My hedgehog is afraid of grumpy hedgehogs, but likes cute ones how can I find him friends? hedgehogs4life Tagging People want to be able to tag their hedgehogs, and then find other hedgehogs with certain tags. “ 7

Slide 8

Slide 8 text

Defining Arrays in SQL C R E A T E T A B L E h e d g e h o g s ( i d i n t e g e r p r i m a r y k e y , n a m e t e x t , a g e i n t e g e r , t a g s t e x t [ ] ) ; 8

Slide 9

Slide 9 text

Defining Arrays in ActiveRecord c r e a t e _ t a b l e : h e d g e h o g s d o | t | t . s t r i n g : n a m e t . i n t e g e r : a g e t . t e x t : t a g s , a r r a y : t r u e e n d ActiveRecord 4.x introduced arrays for Postgres, use a r r a y : t r u e 9

Slide 10

Slide 10 text

Heads Up Define array columns as t . t e x t instead of t . s t r i n g to avoid casting. Postgres assumes that A R R A Y [ ' c u t e ' , ' c u d d l y ' ] is of type t e x t [ ] and will require you to cast, otherwise you will see errors like this: E R R O R : o p e r a t o r d o e s n o t e x i s t : c h a r a c t e r v a r y i n g [ ] & & t e x t [ ] 10

Slide 11

Slide 11 text

Boolean Set Operators You can use the set operators to query arrays. •   A @ > B A contains all of B •   A & & B A overlaps any of B 11

Slide 12

Slide 12 text

Querying Tags in SQL Find all the hedgehogs that are spiny or prickly: S E L E C T n a m e , t a g s F R O M h e d g e h o g s W H E R E t a g s & & A R R A Y [ ' s p i n y ' , ' p r i c k l y ' ] ; A & & B A overlaps any of B 12

Slide 13

Slide 13 text

Querying Tags in SQL name tags Marty spiny, prickly, cute Quilby cuddly, prickly, hungry Thomas grumpy, prickly, sleepy, spiny Franklin spiny, round, tiny 13

Slide 14

Slide 14 text

Querying Tags in SQL Find all the hedgehogs that are spiny and prickly: S E L E C T n a m e , t a g s F R O M h e d g e h o g s W H E R E t a g s @ > A R R A Y [ ' s p i n y ' , ' p r i c k l y ' ] ; A @ > B A contains all the B 14

Slide 15

Slide 15 text

Querying Tags in SQL name tags Marty spiny, prickly, cute Thomas grumpy, prickly, sleepy, spiny 15

Slide 16

Slide 16 text

Querying Tags in ActiveRecord Find all the hedgehogs that are spiny and prickly H e d g e h o g . w h e r e " t a g s @ > A R R A Y [ ? ] " , [ ' s p i n y ' , ' p r i c k l y ' ] 16

Slide 17

Slide 17 text

Querying Tags in ActiveRecord Create scopes to encapsulate set operations: c l a s s H e d g e h o g < A c t i v e R e c o r d : : B a s e s c o p e : a n y _ t a g s , - > ( * t a g s ) { w h e r e ( ' t a g s & & A R R A Y [ ? ] ' , t a g s ) } s c o p e : a l l _ t a g s , - > ( * t a g s ) { w h e r e ( ' t a g s @ > A R R A Y [ ? ] ' , t a g s ) } e n d 17

Slide 18

Slide 18 text

Querying Tags in ActiveRecord Find all the hedgehogs that are spiny or large, and older than 4: H e d g e h o g . a n y _ t a g s ( ' s p i n y ' , ' l a r g e ' ) . w h e r e ( ' a g e > ? ' , 4 ) 18

Slide 19

Slide 19 text

Hi, I run an influential hedgehog club. Our members would all use HedgeWith.me, if they could show which hogs are members of our selective society. Boston Spine Fancy President Hierarchy Apparently there are thousands of hedgehog leagues, divisions, societies, clubs, and so forth. “ 19

Slide 20

Slide 20 text

Hierarchy We need to efficiently model a club hierarchy like this: •  North American League •  Western Division •  Cascadia Hog Friends •  Californian Hedge Society How can we support operations like finding a club's depth, children, or parents? 20

Slide 21

Slide 21 text

Materialized Path in SQL Encode the parent ids of each record in its p a t h. C R E A T E T A B L E c l u b s ( i d i n t e g e r p r i m a r y k e y , n a m e t e x t , p a t h i n t e g e r [ ] ) ; 21

Slide 22

Slide 22 text

Querying a Materialized Path id name path 1 North American League [1] 2 Eastern Division [1,2] 4 New York Quillers [1,2,4] 5 Boston Spine Fancy [1,2,5] 3 Western Division [1,3] 6 Cascadia Hog Friends [1,3,6] 7 California Hedge Society [1,3,7] ... 22

Slide 23

Slide 23 text

Materialized Path: Depth The depth of each club is simply the length of its path. •   a r r a y _ l e n g t h ( a r r a y , d i m ) returns the length of the array d i m will always be 1 unless you are using multidimensional arrays. 23

Slide 24

Slide 24 text

Materialized Path: Depth Display the top two tiers of hedgehog clubs: S E L E C T n a m e , p a t h , a r r a y _ l e n g t h ( p a t h , 1 ) A S d e p t h F R O M c l u b s W H E R E a r r a y _ l e n g t h ( p a t h , 1 ) < = 2 O R D E R B Y p a t h ; a r r a y _ l e n g t h ( p a t h , 1 ) is the depth of record 24

Slide 25

Slide 25 text

Materialized Path: Depth name path depth North American League [1] 1 Eastern Division [1,2] 2 Western Division [1,3] 2 South American League [9] 1 25

Slide 26

Slide 26 text

Materialized Path: Children Find all the clubs that are children of the California Hedge Society, ID: 7. S E L E C T i d , n a m e , p a t h F R O M c l u b s W H E R E p a t h & & A R R A Y [ 7 ] O R D E R B Y p a t h A & & B A overlaps any of B 26

Slide 27

Slide 27 text

Materialized Path: Children id name path 7 Californian Hedge Society [1,3,7] 8 Real Hogs of the OC [1,3,7,8] 12 Hipster Hogs [1,3,7,12] Apparently it is illegal to own hedgehogs in California 27

Slide 28

Slide 28 text

Materialized Path: Parents Find the parents of the California Hedge Society, Path: A R R A Y [ 1 , 3 , 7 ]. S E L E C T n a m e , p a t h F R O M c l u b s W H E R E A R R A Y [ i d ] & & A R R A Y [ 1 , 3 , 7 ] O R D E R B Y p a t h ; A & & B A overlaps any of B 28

Slide 29

Slide 29 text

Materialized Path: Parents id name path 1 North American League [1] 3 Western Division [1,3] 7 Californian Hedge Society [1,3,7] 29

Slide 30

Slide 30 text

ActiveRecord: Arrays & Depth With ActiveRecord 4.x, p a t h is just ruby array. c l a s s C l u b < A c t i v e R e c o r d : : B a s e d e f d e p t h s e l f . p a t h . l e n g t h e n d . . . 30

Slide 31

Slide 31 text

Querying in ActiveRecord Encapsulate these conditions as instance methods: c l a s s C l u b < A c t i v e R e c o r d : : B a s e d e f c h i l d r e n C l u b . w h e r e ( ' p a t h & & A R R A Y [ ? ] ' , s e l f . i d ) e n d d e f p a r e n t s C l u b . w h e r e ( ' A R R A Y [ i d ] & & A R R A Y [ ? ] ' , s e l f . p a t h ) e n d 31

Slide 32

Slide 32 text

Querying in ActiveRecord Now we have an easy way to query the hierarchy. @ c l u b . p a r e n t s . l i m i t ( 5 ) @ c l u b . c h i l d r e n . j o i n s ( : h e d g e h o g s ) . m e r g e ( H e d g e h o g . a n y _ t a g s ( ' s i l l y ' ) ) These features can all work together. Mind blown? 32

Slide 33

Slide 33 text

I need to keep track of my hedgehogs' favorite foods, colors, weight, eye color, and shoe sizes! the Quantified Hedgehog Owner If I am forced to enter my hedgehog's shoe size, I will quit immediately! the Unquantified Hedgehog Owner Custom Data Your users want to record arbitrary data about their hedgehogs. “ 33

Slide 34

Slide 34 text

Hstore Hstore provides a hash column type. It is a useful alternative to ActiveRecord's s e r i a l i z e where the keys and values can be queried in Postgres. 34

Slide 35

Slide 35 text

Hstore Hstore needs to be installed manually. Your migration will look like this: c l a s s I n s t a l l H s t o r e < A c t i v e R e c o r d : : M i g r a t i o n d e f u p e x e c u t e ' C R E A T E E X T E N S I O N h s t o r e ' e n d . . . 35

Slide 36

Slide 36 text

Heads Up Although hstore is supported by ActiveRecord 4.x, the default schema format does not support extensions. Update c o n f i g / a p p l i c a t i o n . r b to use the SQL schema format, otherwise your tests will fail. c l a s s A p p l i c a t i o n < R a i l s : : A p p l i c a t i o n c o n f i g . a c t i v e _ r e c o r d . s c h e m a _ f o r m a t = : s q l e n d 36

Slide 37

Slide 37 text

Defining an Hstore in SQL C R E A T E T A B L E h e d g e h o g s ( i d i n t e g e r p r i m a r y k e y , n a m e t e x t , a g e i n t e g e r , t a g s t e x t [ ] , c u s t o m h s t o r e D E F A U L T ' ' N O T N U L L ) ; 37

Slide 38

Slide 38 text

Defining an Hstore in ActiveRecord h s t o r e is supported in ActiveRecord 4.x as a normal column type: c r e a t e _ t a b l e : h e d g e h o g s d o | t | t . s t r i n g : n a m e t . i n t e g e r : a g e t . t e x t : t a g s , a r r a y : t r u e t . h s t o r e : c u s t o m , : d e f a u l t = > ' ' , : n u l l = > f a l s e e n d 38

Slide 39

Slide 39 text

Heads Up Save yourself some hassle, and specify an empty hstore by default: t . h s t o r e : c u s t o m , : d e f a u l t = > ' ' , : n u l l = > f a l s e Otherwise new records will have null hstores. 39

Slide 40

Slide 40 text

Hstore Format Hstore uses a text format, it looks a lot like a ruby 1.8 hash: U P D A T E h e d g e h o g s S E T c u s t o m = ' " f a v o r i t e _ f o o d " = > " l e m o n s " , " w e i g h t " = > " 2 l b s " ' W H E R E i d = 1 ; Be careful of quoting. 40

Slide 41

Slide 41 text

Hstore Operators Common functions and operators: •   d e f i n e d ( A , B ) Does A have B? •   A - > B Get B from A. In ruby this would be A[B] 41

Slide 42

Slide 42 text

Query Hstore in SQL Find all the favorite foods of the hedgehogs: S E L E C T n a m e , c u s t o m - > ' f a v o r i t e _ f o o d ' A S f o o d F R O M h e d g e h o g s W H E R E d e f i n e d ( c u s t o m , ' f a v o r i t e _ f o o d ' ) ; d e f i n e d ( A , B ) Does A have B? A - > B Get B from A. In ruby this would be A[B] 42

Slide 43

Slide 43 text

Query Hstore in SQL name food Horrace lemons Quilby pasta Thomas grubs 43

Slide 44

Slide 44 text

Query Hstore in ActiveRecord Create scopes to make querying easier: c l a s s H e d g e h o g < A c t i v e R e c o r d : : B a s e s c o p e : h a s _ k e y , - > ( k e y ) { w h e r e ( ' d e f i n e d ( c u s t o m , ? ) ' , k e y ) } s c o p e : h a s _ v a l u e , - > ( k e y , v a l u e ) { w h e r e ( ' c u s t o m - > ? = ? ' , k e y , v a l u e ) } . . . 44

Slide 45

Slide 45 text

Query Hstore in ActiveRecord Find hedgehogs with a custom c o l o r: H e d g e h o g . h a s _ k e y ( ' c o l o r ' ) 45

Slide 46

Slide 46 text

Query Hstore in ActiveRecord Find hedgehogs that are brown: H e d g e h o g . h a s _ v a l u e ( ' c o l o r ' , ' b r o w n ' ) 46

Slide 47

Slide 47 text

Query Hstore in ActiveRecord Find all the silly, brown, hedgehogs: H e d g e h o g . a n y _ t a g s ( ' s i l l y ' ) . h a s _ v a l u e ( ' c o l o r ' , ' b r o w n ' ) 47

Slide 48

Slide 48 text

Updating an Hstore with ActiveRecord With ActiveRecord 4.x, hstore columns are just hashes: h e d g e h o g . c u s t o m [ " f a v o r i t e _ c o l o r " ] = " o c h r e " h e d g e h o g . c u s t o m = { f a v o r i t e _ f o o d : " P e a n u t s " , s h o e _ s i z e : 3 } 48

Slide 49

Slide 49 text

Heads Up Hstore columns are always stored as strings: h e d g e h o g . c u s t o m [ " w e i g h t " ] = 3 h e d g e h o g . s a v e ! h e d g e h o g . r e l o a d h e d g e h o g . c u s t o m [ ' w e i g h t ' ] . c l a s s # = > S t r i n g 49

Slide 50

Slide 50 text

Someone commented on my hedgehog. They said they enjoy his beady little eyes, but I can't find it. hogmama73 Full Text Search Your users want to be able to search within their comments. “ 50

Slide 51

Slide 51 text

Full Text Search in SQL C R E A T E T A B L E c o m m e n t s ( i d i n t e g e r p r i m a r y k e y , h e d g e h o g _ i d i n t e g e r , b o d y t e x t ) ; 51

Slide 52

Slide 52 text

Full Text Search Data Types There are two important data types: •   t s v e c t o r represents the text to be searched •   t s q u e r y represents the search query 52

Slide 53

Slide 53 text

Full Text Search Functions There are two main functions that convert strings into these types: •   t o _ t s v e c t o r ( c o n f i g u r a t i o n , t e x t ) creates a normalized t s v e c t o r •   t o _ t s q u e r y ( c o n f i g u r a t i o n , t e x t ) creates a normalized t s q u e r y 53

Slide 54

Slide 54 text

Full Text Search Normalization Postgres removes common stop words: s e l e c t t o _ t s v e c t o r ( ' A b o y a n d h i s h e d g e h o g w e n t t o P o r t l a n d ' ) ; - - b o y , h e d g e h o g , p o r t l a n d , w e n t s e l e c t t o _ t s v e c t o r ( ' I n e e d a s e c o n d l i n e t o f i l l s p a c e h e r e . ' ) ; - - f i l l , l i n e , n e e d , s e c o n d , s p a c e 54

Slide 55

Slide 55 text

Full Text Search Normalization Stemming removes common endings from words: term stemmed hedgehogs hedgehog enjoying enjoy piping pipe 55

Slide 56

Slide 56 text

Full Text Search Operators Vectors: •   V @ @ Q Searches V for Q Queries: •   V @ @ ( A & & B ) Searches V for A and B •   V @ @ ( A | | B ) Searches V for A or B 56

Slide 57

Slide 57 text

Full Text Search Querying Find comments about "enjoying" something: S E L E C T b o d y F R O M c o m m e n t s W H E R E t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) @ @ t o _ t s q u e r y ( ' e n g l i s h ' , ' e n j o y i n g ' ) ; V @ @ Q Searches V for Q 57

Slide 58

Slide 58 text

Full Text Search Querying •  Does he enjoy beets? Mine loves them •  I really enjoy oranges •  I am enjoying these photos of your hedgehog's beady little eyes •  Can I feed him grapes? I think he enjoys them. Notice how "enjoying" also matched "enjoy" and "enjoys" due to stemming. 58

Slide 59

Slide 59 text

Full Text Search Wildcards •   t o _ t s q u e r y ( ' e n g l i s h ' , ' c a t : * ' ) Searches for anything starting with cat Such as: cat, catapult, cataclysmic. But not: octocat, scatter, prognosticate 59

Slide 60

Slide 60 text

Full Text Search Wild Cards Find comments containing the term "oil", and a word starting with "quil" : S E L E C T b o d y F R O M c o m m e n t s W H E R E t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) @ @ ( t o _ t s q u e r y ( ' e n g l i s h ' , ' o i l ' ) & & t o _ t s q u e r y ( ' e n g l i s h ' , ' q u i l : * ' ) ) ; V @ @ ( A & & B ) Searches V for A and B 60

Slide 61

Slide 61 text

Full Text Search Querying •  What brand of oil do you use? Have you tried QuillSwill? 61

Slide 62

Slide 62 text

Heads Up t s q u e r y only supports wildcards at the end of a term. While q u i l l : * will match "QuillSwill", but * : s w i l l will not. In fact, * : s w i l l will throw an error. 62

Slide 63

Slide 63 text

Even More Heads Up! Never pass user input directly to t o _ t s q u e r y, it has a strict mini search syntax. The following all fail: •   h t t p : / / l o c a l h o s t : has a special meaning •   O ' R e i l l y ' s B o o k s Paired quotes cannot be in the middle •   A & & B & and | are used for combining terms You need to sanitize queries, or use a gem that does this for you. 63

Slide 64

Slide 64 text

Full Text Search With ActiveRecord We can wrap this up in a scope. c l a s s C o m m e n t < A c t i v e R e c o r d : : B a s e s c o p e : s e a r c h _ a l l , - > ( q u e r y ) { w h e r e ( " t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) @ @ # { s a n i t i z e _ q u e r y ( q u e r y ) } " ) } You need to write s a n i t i z e _ q u e r y, or use a gem that does this for you. 64

Slide 65

Slide 65 text

Full Text Search With ActiveRecord Find the comments about quill oil again, and limit it to 5 results: C o m m e n t . s e a r c h _ a l l ( " q u i l * o i l " ) . l i m i t ( 5 ) Since s e a r c h _ a l l is a scope, we chain it like all the other examples. 65

Slide 66

Slide 66 text

Full Text Search Indexing Create an index on the function call t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ): C R E A T E I N D E X c o m m e n t s _ g i n _ i n d e x O N c o m m e n t s U S I N G g i n ( t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) ) ; The g i n index is a special index for multivalued columns like a t e x t [ ] or a t s v e c t o r 66

Slide 67

Slide 67 text

Heads Up Since we are indexing a function call, t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ), we must call it the same way every time. You don't have to use e n g l i s h, but you do need to be consistent. 67

Slide 68

Slide 68 text

In Summary •  Arrays can model tagging and hierarchies •  Hstore can be used to model custom data •  Postgres supports full text search You can now enjoy the happy hour! S E L E C T * F R O M b e e r s W H E R E t r a i t s @ > A R R A Y [ ' h o p p y ' , ' f l o r a l ' ] 68

Slide 69

Slide 69 text

Any Questions? Possible suggestions: •  Why not normalize your database instead of using arrays? •  Can I see how you implemented s a n i t i z e _ q u e r y? •  What is a good gem for full text search? •  What about ActiveRecord 2 and 3? •  Why hstore instead of JSON? •  Can I buy you coffee? 69

Slide 70

Slide 70 text

Extra Resources •  ActiveRecord Queries & Scopes •  Postgres Array Operators •  Postgres Hstore Documentation •  Postgres Full Text Search •  Ruby Gems for Full Text Search •  Textacular Supports Active Record 2.x and 3.x •  pg_search Supports Active Record 3.x, but has more features •  My Blog, Github, and favorite social network •  How to draw a hedgehog. 70

Slide 71

Slide 71 text

Bonus Here's s a n i t i z e _ q u e r y: d e f s e l f . s a n i t i z e _ q u e r y ( q u e r y , c o n j u n c t i o n = ' & & ' ) " ( " + t o k e n i z e _ q u e r y ( q u e r y ) . m a p { | t | t e r m ( t ) } . j o i n ( c o n j u n c t i o n ) + " ) " e n d It breaks up the user's request into terms, and then joins them together. 71

Slide 72

Slide 72 text

Bonus We tokenize by splitting on white space, &, |, and :. d e f s e l f . t o k e n i z e _ q u e r y ( q u e r y ) q u e r y . s p l i t ( / ( \ s | [ & | : ] ) + / ) e n d 72

Slide 73

Slide 73 text

Bonus Each of those tokens gets rewritten: d e f s e l f . t e r m ( t ) # S t r i p l e a d i n g a p o s t r o p h e s , t h e y a r e n e v e r l e g a l , " ' o k " b e c o m e s " o k " t = t . g s u b ( / ^ ' + / , ' ' ) # S t r i p a n y * s t h a t a r e n o t a t t h e e n d o f t h e t e r m t = t . g s u b ( / \ * [ ^ $ ] / , ' ' ) # R e w r i t e " s e a r * " a s " s e a r : * " t o s u p p o r t w i l d c a r d m a t c h i n g o n t e r m s t = t . g s u b ( / \ * $ / , ' : * ' ) . . . 73

Slide 74

Slide 74 text

. . . # I f t h e o n l y r e m a i n i n g t e x t i s a w i l d c a r d , r e t u r n a n e m p t y s t r i n g t = " " i f t . m a t c h ( / ^ [ : * ] + $ / ) " t o _ t s q u e r y ( ' e n g l i s h ' , # { q u o t e _ v a l u e t } ) " e n d 74