Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgres: The Best Tool You're Already Using

Postgres: The Best Tool You're Already Using

Postgres is a great database, this talk covers four useful techniques that you can use to build new features into your applications right now.

We cover the SQL first, then its application in ActiveRecord 4.

65bd9e4c5aebde25ebf16d599339d570?s=128

Adam Sanderson

April 30, 2013
Tweet

Transcript

  1. Postgres The Best Tool You're Already Using •  Adam Sanderson

    •  LiquidPlanner 1
  2. Adam Sanderson I have been a full stack engineer at

    LiquidPlanner for 5 years. •  I got off in Kansas*, and that's ok! •  Github: adamsanderson •  Twitter: adamsanderson •  Blog: http://monkeyandcrow.com * Seattle 2
  3. Online project management with probabilistic scheduling. •  Started in 2007

    with Rails 1.x •  Used Postgres from the beginning •  We have learned some great techniques along the way 3
  4. Topics •  Tagging •  Hierarchy •  Custom Data •  Full

    Text Search 4
  5. Method For each topic, we'll cover the SQL before we

    cover its use in ActiveRecord. We will use Postgres 9.x, Ruby 1.9 syntax, and ActiveRecord 4.0. If you understand the SQL you can use it in any version of ActiveRecord, 4.0 just makes it easier. 5
  6. Backstory You just built a great new social network for

    hedgehog lovers around the world, HedgeWith.me. Everything is going well. You have a few users, but now they want more. 6
  7. My hedgehog is afraid of grumpy hedgehogs, but likes cute

    ones how can I find him friends? hedgehogs4life Tagging People want to be able to tag their hedgehogs, and then find other hedgehogs with certain tags. “ 7
  8. Defining Arrays in SQL C R E A T E

    T A B L E h e d g e h o g s ( i d i n t e g e r p r i m a r y k e y , n a m e t e x t , a g e i n t e g e r , t a g s t e x t [ ] ) ; 8
  9. Defining Arrays in ActiveRecord c r e a t e

    _ t a b l e : h e d g e h o g s d o | t | t . s t r i n g : n a m e t . i n t e g e r : a g e t . t e x t : t a g s , a r r a y : t r u e e n d ActiveRecord 4.x introduced arrays for Postgres, use a r r a y : t r u e 9
  10. Heads Up Define array columns as t . t e

    x t instead of t . s t r i n g to avoid casting. Postgres assumes that A R R A Y [ ' c u t e ' , ' c u d d l y ' ] is of type t e x t [ ] and will require you to cast, otherwise you will see errors like this: E R R O R : o p e r a t o r d o e s n o t e x i s t : c h a r a c t e r v a r y i n g [ ] & & t e x t [ ] 10
  11. Boolean Set Operators You can use the set operators to

    query arrays. •   A @ > B A contains all of B •   A & & B A overlaps any of B 11
  12. Querying Tags in SQL Find all the hedgehogs that are

    spiny or prickly: S E L E C T n a m e , t a g s F R O M h e d g e h o g s W H E R E t a g s & & A R R A Y [ ' s p i n y ' , ' p r i c k l y ' ] ; A & & B A overlaps any of B 12
  13. Querying Tags in SQL name tags Marty spiny, prickly, cute

    Quilby cuddly, prickly, hungry Thomas grumpy, prickly, sleepy, spiny Franklin spiny, round, tiny 13
  14. Querying Tags in SQL Find all the hedgehogs that are

    spiny and prickly: S E L E C T n a m e , t a g s F R O M h e d g e h o g s W H E R E t a g s @ > A R R A Y [ ' s p i n y ' , ' p r i c k l y ' ] ; A @ > B A contains all the B 14
  15. Querying Tags in SQL name tags Marty spiny, prickly, cute

    Thomas grumpy, prickly, sleepy, spiny 15
  16. Querying Tags in ActiveRecord Find all the hedgehogs that are

    spiny and prickly H e d g e h o g . w h e r e " t a g s @ > A R R A Y [ ? ] " , [ ' s p i n y ' , ' p r i c k l y ' ] 16
  17. Querying Tags in ActiveRecord Create scopes to encapsulate set operations:

    c l a s s H e d g e h o g < A c t i v e R e c o r d : : B a s e s c o p e : a n y _ t a g s , - > ( * t a g s ) { w h e r e ( ' t a g s & & A R R A Y [ ? ] ' , t a g s ) } s c o p e : a l l _ t a g s , - > ( * t a g s ) { w h e r e ( ' t a g s @ > A R R A Y [ ? ] ' , t a g s ) } e n d 17
  18. Querying Tags in ActiveRecord Find all the hedgehogs that are

    spiny or large, and older than 4: H e d g e h o g . a n y _ t a g s ( ' s p i n y ' , ' l a r g e ' ) . w h e r e ( ' a g e > ? ' , 4 ) 18
  19. Hi, I run an influential hedgehog club. Our members would

    all use HedgeWith.me, if they could show which hogs are members of our selective society. Boston Spine Fancy President Hierarchy Apparently there are thousands of hedgehog leagues, divisions, societies, clubs, and so forth. “ 19
  20. Hierarchy We need to efficiently model a club hierarchy like

    this: •  North American League •  Western Division •  Cascadia Hog Friends •  Californian Hedge Society How can we support operations like finding a club's depth, children, or parents? 20
  21. Materialized Path in SQL Encode the parent ids of each

    record in its p a t h. C R E A T E T A B L E c l u b s ( i d i n t e g e r p r i m a r y k e y , n a m e t e x t , p a t h i n t e g e r [ ] ) ; 21
  22. Querying a Materialized Path id name path 1 North American

    League [1] 2 Eastern Division [1,2] 4 New York Quillers [1,2,4] 5 Boston Spine Fancy [1,2,5] 3 Western Division [1,3] 6 Cascadia Hog Friends [1,3,6] 7 California Hedge Society [1,3,7] ... 22
  23. Materialized Path: Depth The depth of each club is simply

    the length of its path. •   a r r a y _ l e n g t h ( a r r a y , d i m ) returns the length of the array d i m will always be 1 unless you are using multidimensional arrays. 23
  24. Materialized Path: Depth Display the top two tiers of hedgehog

    clubs: S E L E C T n a m e , p a t h , a r r a y _ l e n g t h ( p a t h , 1 ) A S d e p t h F R O M c l u b s W H E R E a r r a y _ l e n g t h ( p a t h , 1 ) < = 2 O R D E R B Y p a t h ; a r r a y _ l e n g t h ( p a t h , 1 ) is the depth of record 24
  25. Materialized Path: Depth name path depth North American League [1]

    1 Eastern Division [1,2] 2 Western Division [1,3] 2 South American League [9] 1 25
  26. Materialized Path: Children Find all the clubs that are children

    of the California Hedge Society, ID: 7. S E L E C T i d , n a m e , p a t h F R O M c l u b s W H E R E p a t h & & A R R A Y [ 7 ] O R D E R B Y p a t h A & & B A overlaps any of B 26
  27. Materialized Path: Children id name path 7 Californian Hedge Society

    [1,3,7] 8 Real Hogs of the OC [1,3,7,8] 12 Hipster Hogs [1,3,7,12] Apparently it is illegal to own hedgehogs in California 27
  28. Materialized Path: Parents Find the parents of the California Hedge

    Society, Path: A R R A Y [ 1 , 3 , 7 ]. S E L E C T n a m e , p a t h F R O M c l u b s W H E R E A R R A Y [ i d ] & & A R R A Y [ 1 , 3 , 7 ] O R D E R B Y p a t h ; A & & B A overlaps any of B 28
  29. Materialized Path: Parents id name path 1 North American League

    [1] 3 Western Division [1,3] 7 Californian Hedge Society [1,3,7] 29
  30. ActiveRecord: Arrays & Depth With ActiveRecord 4.x, p a t

    h is just ruby array. c l a s s C l u b < A c t i v e R e c o r d : : B a s e d e f d e p t h s e l f . p a t h . l e n g t h e n d . . . 30
  31. Querying in ActiveRecord Encapsulate these conditions as instance methods: c

    l a s s C l u b < A c t i v e R e c o r d : : B a s e d e f c h i l d r e n C l u b . w h e r e ( ' p a t h & & A R R A Y [ ? ] ' , s e l f . i d ) e n d d e f p a r e n t s C l u b . w h e r e ( ' A R R A Y [ i d ] & & A R R A Y [ ? ] ' , s e l f . p a t h ) e n d 31
  32. Querying in ActiveRecord Now we have an easy way to

    query the hierarchy. @ c l u b . p a r e n t s . l i m i t ( 5 ) @ c l u b . c h i l d r e n . j o i n s ( : h e d g e h o g s ) . m e r g e ( H e d g e h o g . a n y _ t a g s ( ' s i l l y ' ) ) These features can all work together. Mind blown? 32
  33. I need to keep track of my hedgehogs' favorite foods,

    colors, weight, eye color, and shoe sizes! the Quantified Hedgehog Owner If I am forced to enter my hedgehog's shoe size, I will quit immediately! the Unquantified Hedgehog Owner Custom Data Your users want to record arbitrary data about their hedgehogs. “ 33
  34. Hstore Hstore provides a hash column type. It is a

    useful alternative to ActiveRecord's s e r i a l i z e where the keys and values can be queried in Postgres. 34
  35. Hstore Hstore needs to be installed manually. Your migration will

    look like this: c l a s s I n s t a l l H s t o r e < A c t i v e R e c o r d : : M i g r a t i o n d e f u p e x e c u t e ' C R E A T E E X T E N S I O N h s t o r e ' e n d . . . 35
  36. Heads Up Although hstore is supported by ActiveRecord 4.x, the

    default schema format does not support extensions. Update c o n f i g / a p p l i c a t i o n . r b to use the SQL schema format, otherwise your tests will fail. c l a s s A p p l i c a t i o n < R a i l s : : A p p l i c a t i o n c o n f i g . a c t i v e _ r e c o r d . s c h e m a _ f o r m a t = : s q l e n d 36
  37. Defining an Hstore in SQL C R E A T

    E T A B L E h e d g e h o g s ( i d i n t e g e r p r i m a r y k e y , n a m e t e x t , a g e i n t e g e r , t a g s t e x t [ ] , c u s t o m h s t o r e D E F A U L T ' ' N O T N U L L ) ; 37
  38. Defining an Hstore in ActiveRecord h s t o r

    e is supported in ActiveRecord 4.x as a normal column type: c r e a t e _ t a b l e : h e d g e h o g s d o | t | t . s t r i n g : n a m e t . i n t e g e r : a g e t . t e x t : t a g s , a r r a y : t r u e t . h s t o r e : c u s t o m , : d e f a u l t = > ' ' , : n u l l = > f a l s e e n d 38
  39. Heads Up Save yourself some hassle, and specify an empty

    hstore by default: t . h s t o r e : c u s t o m , : d e f a u l t = > ' ' , : n u l l = > f a l s e Otherwise new records will have null hstores. 39
  40. Hstore Format Hstore uses a text format, it looks a

    lot like a ruby 1.8 hash: U P D A T E h e d g e h o g s S E T c u s t o m = ' " f a v o r i t e _ f o o d " = > " l e m o n s " , " w e i g h t " = > " 2 l b s " ' W H E R E i d = 1 ; Be careful of quoting. 40
  41. Hstore Operators Common functions and operators: •   d e

    f i n e d ( A , B ) Does A have B? •   A - > B Get B from A. In ruby this would be A[B] 41
  42. Query Hstore in SQL Find all the favorite foods of

    the hedgehogs: S E L E C T n a m e , c u s t o m - > ' f a v o r i t e _ f o o d ' A S f o o d F R O M h e d g e h o g s W H E R E d e f i n e d ( c u s t o m , ' f a v o r i t e _ f o o d ' ) ; d e f i n e d ( A , B ) Does A have B? A - > B Get B from A. In ruby this would be A[B] 42
  43. Query Hstore in SQL name food Horrace lemons Quilby pasta

    Thomas grubs 43
  44. Query Hstore in ActiveRecord Create scopes to make querying easier:

    c l a s s H e d g e h o g < A c t i v e R e c o r d : : B a s e s c o p e : h a s _ k e y , - > ( k e y ) { w h e r e ( ' d e f i n e d ( c u s t o m , ? ) ' , k e y ) } s c o p e : h a s _ v a l u e , - > ( k e y , v a l u e ) { w h e r e ( ' c u s t o m - > ? = ? ' , k e y , v a l u e ) } . . . 44
  45. Query Hstore in ActiveRecord Find hedgehogs with a custom c

    o l o r: H e d g e h o g . h a s _ k e y ( ' c o l o r ' ) 45
  46. Query Hstore in ActiveRecord Find hedgehogs that are brown: H

    e d g e h o g . h a s _ v a l u e ( ' c o l o r ' , ' b r o w n ' ) 46
  47. Query Hstore in ActiveRecord Find all the silly, brown, hedgehogs:

    H e d g e h o g . a n y _ t a g s ( ' s i l l y ' ) . h a s _ v a l u e ( ' c o l o r ' , ' b r o w n ' ) 47
  48. Updating an Hstore with ActiveRecord With ActiveRecord 4.x, hstore columns

    are just hashes: h e d g e h o g . c u s t o m [ " f a v o r i t e _ c o l o r " ] = " o c h r e " h e d g e h o g . c u s t o m = { f a v o r i t e _ f o o d : " P e a n u t s " , s h o e _ s i z e : 3 } 48
  49. Heads Up Hstore columns are always stored as strings: h

    e d g e h o g . c u s t o m [ " w e i g h t " ] = 3 h e d g e h o g . s a v e ! h e d g e h o g . r e l o a d h e d g e h o g . c u s t o m [ ' w e i g h t ' ] . c l a s s # = > S t r i n g 49
  50. Someone commented on my hedgehog. They said they enjoy his

    beady little eyes, but I can't find it. hogmama73 Full Text Search Your users want to be able to search within their comments. “ 50
  51. Full Text Search in SQL C R E A T

    E T A B L E c o m m e n t s ( i d i n t e g e r p r i m a r y k e y , h e d g e h o g _ i d i n t e g e r , b o d y t e x t ) ; 51
  52. Full Text Search Data Types There are two important data

    types: •   t s v e c t o r represents the text to be searched •   t s q u e r y represents the search query 52
  53. Full Text Search Functions There are two main functions that

    convert strings into these types: •   t o _ t s v e c t o r ( c o n f i g u r a t i o n , t e x t ) creates a normalized t s v e c t o r •   t o _ t s q u e r y ( c o n f i g u r a t i o n , t e x t ) creates a normalized t s q u e r y 53
  54. Full Text Search Normalization Postgres removes common stop words: s

    e l e c t t o _ t s v e c t o r ( ' A b o y a n d h i s h e d g e h o g w e n t t o P o r t l a n d ' ) ; - - b o y , h e d g e h o g , p o r t l a n d , w e n t s e l e c t t o _ t s v e c t o r ( ' I n e e d a s e c o n d l i n e t o f i l l s p a c e h e r e . ' ) ; - - f i l l , l i n e , n e e d , s e c o n d , s p a c e 54
  55. Full Text Search Normalization Stemming removes common endings from words:

    term stemmed hedgehogs hedgehog enjoying enjoy piping pipe 55
  56. Full Text Search Operators Vectors: •   V @ @

    Q Searches V for Q Queries: •   V @ @ ( A & & B ) Searches V for A and B •   V @ @ ( A | | B ) Searches V for A or B 56
  57. Full Text Search Querying Find comments about "enjoying" something: S

    E L E C T b o d y F R O M c o m m e n t s W H E R E t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) @ @ t o _ t s q u e r y ( ' e n g l i s h ' , ' e n j o y i n g ' ) ; V @ @ Q Searches V for Q 57
  58. Full Text Search Querying •  Does he enjoy beets? Mine

    loves them •  I really enjoy oranges •  I am enjoying these photos of your hedgehog's beady little eyes •  Can I feed him grapes? I think he enjoys them. Notice how "enjoying" also matched "enjoy" and "enjoys" due to stemming. 58
  59. Full Text Search Wildcards •   t o _ t

    s q u e r y ( ' e n g l i s h ' , ' c a t : * ' ) Searches for anything starting with cat Such as: cat, catapult, cataclysmic. But not: octocat, scatter, prognosticate 59
  60. Full Text Search Wild Cards Find comments containing the term

    "oil", and a word starting with "quil" : S E L E C T b o d y F R O M c o m m e n t s W H E R E t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) @ @ ( t o _ t s q u e r y ( ' e n g l i s h ' , ' o i l ' ) & & t o _ t s q u e r y ( ' e n g l i s h ' , ' q u i l : * ' ) ) ; V @ @ ( A & & B ) Searches V for A and B 60
  61. Full Text Search Querying •  What brand of oil do

    you use? Have you tried QuillSwill? 61
  62. Heads Up t s q u e r y only

    supports wildcards at the end of a term. While q u i l l : * will match "QuillSwill", but * : s w i l l will not. In fact, * : s w i l l will throw an error. 62
  63. Even More Heads Up! Never pass user input directly to

    t o _ t s q u e r y, it has a strict mini search syntax. The following all fail: •   h t t p : / / l o c a l h o s t : has a special meaning •   O ' R e i l l y ' s B o o k s Paired quotes cannot be in the middle •   A & & B & and | are used for combining terms You need to sanitize queries, or use a gem that does this for you. 63
  64. Full Text Search With ActiveRecord We can wrap this up

    in a scope. c l a s s C o m m e n t < A c t i v e R e c o r d : : B a s e s c o p e : s e a r c h _ a l l , - > ( q u e r y ) { w h e r e ( " t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) @ @ # { s a n i t i z e _ q u e r y ( q u e r y ) } " ) } You need to write s a n i t i z e _ q u e r y, or use a gem that does this for you. 64
  65. Full Text Search With ActiveRecord Find the comments about quill

    oil again, and limit it to 5 results: C o m m e n t . s e a r c h _ a l l ( " q u i l * o i l " ) . l i m i t ( 5 ) Since s e a r c h _ a l l is a scope, we chain it like all the other examples. 65
  66. Full Text Search Indexing Create an index on the function

    call t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ): C R E A T E I N D E X c o m m e n t s _ g i n _ i n d e x O N c o m m e n t s U S I N G g i n ( t o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ) ) ; The g i n index is a special index for multivalued columns like a t e x t [ ] or a t s v e c t o r 66
  67. Heads Up Since we are indexing a function call, t

    o _ t s v e c t o r ( ' e n g l i s h ' , b o d y ), we must call it the same way every time. You don't have to use e n g l i s h, but you do need to be consistent. 67
  68. In Summary •  Arrays can model tagging and hierarchies •

     Hstore can be used to model custom data •  Postgres supports full text search You can now enjoy the happy hour! S E L E C T * F R O M b e e r s W H E R E t r a i t s @ > A R R A Y [ ' h o p p y ' , ' f l o r a l ' ] 68
  69. Any Questions? Possible suggestions: •  Why not normalize your database

    instead of using arrays? •  Can I see how you implemented s a n i t i z e _ q u e r y? •  What is a good gem for full text search? •  What about ActiveRecord 2 and 3? •  Why hstore instead of JSON? •  Can I buy you coffee? 69
  70. Extra Resources •  ActiveRecord Queries & Scopes •  Postgres Array

    Operators •  Postgres Hstore Documentation •  Postgres Full Text Search •  Ruby Gems for Full Text Search •  Textacular Supports Active Record 2.x and 3.x •  pg_search Supports Active Record 3.x, but has more features •  My Blog, Github, and favorite social network •  How to draw a hedgehog. 70
  71. Bonus Here's s a n i t i z e

    _ q u e r y: d e f s e l f . s a n i t i z e _ q u e r y ( q u e r y , c o n j u n c t i o n = ' & & ' ) " ( " + t o k e n i z e _ q u e r y ( q u e r y ) . m a p { | t | t e r m ( t ) } . j o i n ( c o n j u n c t i o n ) + " ) " e n d It breaks up the user's request into terms, and then joins them together. 71
  72. Bonus We tokenize by splitting on white space, &, |,

    and :. d e f s e l f . t o k e n i z e _ q u e r y ( q u e r y ) q u e r y . s p l i t ( / ( \ s | [ & | : ] ) + / ) e n d 72
  73. Bonus Each of those tokens gets rewritten: d e f

    s e l f . t e r m ( t ) # S t r i p l e a d i n g a p o s t r o p h e s , t h e y a r e n e v e r l e g a l , " ' o k " b e c o m e s " o k " t = t . g s u b ( / ^ ' + / , ' ' ) # S t r i p a n y * s t h a t a r e n o t a t t h e e n d o f t h e t e r m t = t . g s u b ( / \ * [ ^ $ ] / , ' ' ) # R e w r i t e " s e a r * " a s " s e a r : * " t o s u p p o r t w i l d c a r d m a t c h i n g o n t e r m s t = t . g s u b ( / \ * $ / , ' : * ' ) . . . 73
  74. . . . # I f t h e o

    n l y r e m a i n i n g t e x t i s a w i l d c a r d , r e t u r n a n e m p t y s t r i n g t = " " i f t . m a t c h ( / ^ [ : * ] + $ / ) " t o _ t s q u e r y ( ' e n g l i s h ' , # { q u o t e _ v a l u e t } ) " e n d 74