Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Language and Fixing WP Search

xyu
September 27, 2014

Understanding Language and Fixing WP Search

Let's face it, search in WordPress is kinda terrible, and trying to find related content is even worse. The current WordPress tech stack is restricting us to some extremely outdated search methods. Thankfully help is on the way. In this talk Xiao explains what makes Elasticsearch such a powerful tool and how it complements almost any WordPress install.

xyu

September 27, 2014
Tweet

More Decks by xyu

Other Decks in Technology

Transcript

  1. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  2. SELECT COUNT(*)
 FROM wp_posts
 WHERE post_content LIKE "%WordPress%" SELECT COUNT(*)


    FROM wp_posts
 WHERE post_content LIKE "%on%" SELECT COUNT(*)
 FROM wp_posts
 WHERE post_content LIKE "%NGINX%" … SELECT COUNT(*)
 FROM wp_posts
 WHERE post_content LIKE "%improvement%"
  3. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  4. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  5. SELECT *
 FROM wp_posts
 WHERE
 post_content LIKE "%WordPress%" OR
 post_content

    LIKE "%NGINX%" OR
 post_content LIKE "%HHVM%" OR
 post_content LIKE "%Heroku%" OR
 post_content LIKE "%performance%"
 

  6. SELECT *
 FROM wp_posts
 WHERE
 post_content LIKE "%WordPress%" OR
 post_content

    LIKE "%NGINX%" OR
 post_content LIKE "%HHVM%" OR
 post_content LIKE "%Heroku%" OR
 post_content LIKE "%performance%"
 ORDER BY
 !?
  7. 2 Data Stores — more complexity 2 Data Stores —

    more points of failure 2 Data Stores — more cost
  8. –My Blog to a Human “I almost ran into a

    swarm of baby ducks this morning…”
  9. –My Blog to MySQL 0000000 49 20 61 6c 6d

    6f 73 74 20 72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6 …
  10. 0000000 49 20 61 6c 6d 6f 73 74 20

    72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6 …
  11. 0000000 49 20 61 6c 6d 6f 73 74 20

    72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6 …
  12. “I almost ran into a swarm of baby ducks this

    morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"
  13. “I almost ran into a swarm of baby ducks this

    morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"
  14. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms “The über-quick brown fox
 jumps over the lazy dogs.”
  15. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  16. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  17. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über-quick brown fox
 jumps over the lazy dogs.

  18. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über—quick brown fox 
 jumps over the lazy dogs.

  19. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über quick brown fox
 jumps over the lazy dogs

  20. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  21. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  22. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  23. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  24. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  25. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  26. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  27. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  28. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog

  29. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog

  30. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox vulpes
 over
 lazy
 
 uber
 brown
 jump
 
 dog canis

  31. Elasticsearch Analyzer Chain 
 
 quick
 fox vulpes
 over
 lazy


    
 uber
 brown
 jump
 
 dog canis
 Raw Text → Character Filters → Tokenizer → Token Filters → Terms
  32. Elasticsearch Analyzer Chain Terms Doc IDs brown 1 canis 1

    dog 1 fox 1 jump 1 lazy 1 … over 1 quick 1 uber 1 vulpes 1
  33. Elasticsearch Analyzer Chain Terms Doc IDs brown 1, 3, 6,

    … canis 1, 2, … dog 1, 2, 12… fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …
  34. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms “Jumping Foxes”
  35. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms jump fox vulpes
  36. Elasticsearch Analyzer Chain — On Query Text Terms Doc IDs

    brown 1, 3, 6, … canis 1, 2, … dog 1, 2, 12… fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …
  37. Elasticsearch Filters & Queries Filters Queries Speed Fast Slow(er) Cached

    Yes, With Bitsets! No Matching Boolean Yes/No Relevancy Score
  38. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  39. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  40. curl -XPOST https://public-api.wordpress.com/rest/v1/ sites/www.xyu.io/posts/2361/related -d '{ "size" : 5, "filter"

    : { "and" : [ { "terms" : { "post_format" : [ "image", "gallery", "video" ] } }, { "geo_distance" : { "distance" : "25mi", "location": [ 41.8236, -71.4222 ] } } ] } }' developer.wordpress.com/docs/elasticsearch