Jetpack Related Posts for Power Users

25e2ecf9b520e06d71e47ab083924300?s=47 xyu
August 16, 2014

Jetpack Related Posts for Power Users

The related posts module in Jetpack looks simple at first glance however underneath it’s powered by Elasticsearch, an advanced natural language search engine. Come as we peel back the covers to explain how Jetpack uses Elasticsearch to determine what’s related and more importantly how to customize it to take full advantage of Elasticsearch’s textual analytics abilities.

25e2ecf9b520e06d71e47ab083924300?s=128

xyu

August 16, 2014
Tweet

Transcript

  1. Jetpack Related Posts for Power Users WordCamp Maine (August 2014)

  2. The 13 filters you won't believe is in related posts

    WordCamp Maine (August 2014)
  3. Xiao Yu Code Wrangler — Automattic @HypertextRanch me@xyu.io xyu.io xyu

        
  4. Related Posts

  5. None
  6. None
  7. None
  8. None
  9. Because finding related content is a hard problem to solve.

  10. None
  11. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  12. SELECT COUNT(*)
 FROM wp_posts
 WHERE post_content LIKE "%WordPress%" SELECT COUNT(*)


    FROM wp_posts
 WHERE post_content LIKE "%on%" SELECT COUNT(*)
 FROM wp_posts
 WHERE post_content LIKE "%NGINX%" … SELECT COUNT(*)
 FROM wp_posts
 WHERE post_content LIKE "%improvement%"
  13. NOPE NOPE

  14. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  15. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  16. SELECT *
 FROM wp_posts
 WHERE
 post_content LIKE "%WordPress%" OR
 post_content

    LIKE "%NGINX%" OR
 post_content LIKE "%HHVM%" OR
 post_content LIKE "%Heroku%" OR
 post_content LIKE "%performance%"
 

  17. SELECT *
 FROM wp_posts
 WHERE
 post_content LIKE "%WordPress%" OR
 post_content

    LIKE "%NGINX%" OR
 post_content LIKE "%HHVM%" OR
 post_content LIKE "%Heroku%" OR
 post_content LIKE "%performance%"
 ORDER BY
 !?
  18. None
  19. –Some Plugin Author “Let’s just use tags and categories.”

  20. –Some SEO Consultant’s Dream “Everyone tags & categorize everything perfectly.”

  21. NOPE NOPE

  22. Ok, it’s a hard problem to solve.

  23. WE CAN FIX IT WE CAN FIX IT WITH SCIENCE!

    WITH SCIENCE!
  24. ! elasticsearch

  25. Understands Language

  26. ! Analyzing Text

  27. Elasticsearch Analyzer Chain Character Filters Raw Text Tokenizer Token Filters

    Terms
  28. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms “The über-quick brown fox
 jumps over the lazy dogs.”
  29. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  30. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  31. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über-quick brown fox
 jumps over the lazy dogs.

  32. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über—quick brown fox 
 jumps over the lazy dogs.

  33. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über quick brown fox
 jumps over the lazy dogs

  34. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  35. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  36. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  37. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  38. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  39. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  40. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  41. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  42. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog

  43. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms Terms Doc IDs brown 1 dog 1 fox 1 jump 1 lazy 1 … over 1 quick 1 uber 1
  44. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms Terms Doc IDs brown 1, 3, 6, … dog 1, 2, 12… fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, …
  45. Understands Language Through Analyzers

  46. Understands Relevancy

  47. Relevancy Score? TF-IDF

  48. Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

  49. Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

  50. Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

  51. Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

  52. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  53. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on

    NGINX + HHVM It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on. So without further ado I like to introduce: Heroku WP — A template for HHVM powered WordPress served by nginx. The Goal There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are: It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking. The Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on: NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com. HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks. Update: While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
  54. None
  55. None
  56.  

  57. None
  58. Related Posts

  59. None
  60. Related Posts for Power Users • Customize placement with the

    related posts
 shortcode • Change results or look and feel with various
 filters • Go completely wild with the related posts
 raw object IT'S OVER 9000! IT'S OVER 9000!
  61. [jetpack-related-posts]

  62. <?php
 echo do_shortcode(
 '[jetpack-related-posts]'
 );
 ?>

  63. Related Posts for Power Users • Customize placement with the

    related posts
 shortcode • Change results or look and feel with various
 filters • Go completely wild with the related posts
 raw object IT'S OVER 9000! IT'S OVER 9000!
  64. Look & Feel Configs

  65. jetpack_relatedposts_filter_options

  66. jetpack_relatedposts_filter_headline

  67. jetpack_relatedposts_filter_thumbnail_size

  68. Fine Tune Matching Posts

  69. jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms' =>

    array(),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  70. jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms' =>

    array(),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  71. jetpack_relatedposts_filter_post_type jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms'

    => array(),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  72. jetpack_relatedposts_filter_post_type jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => array(
 'post',


    'awesome_sauce',
 ),
 'has_terms' => array(),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  73. jetpack_relatedposts_filter_has_terms jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms'

    => array(),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  74. jetpack_relatedposts_filter_has_terms jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms'

    => array(
 get_term_by( 'slug', 'devops', 'category' ),
 get_term_by( 'slug', 'hhvm', 'post_tag' ),
 ),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  75. jetpack_relatedposts_filter_date_range jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms'

    => array(),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  76. jetpack_relatedposts_filter_date_range jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms'

    => array(),
 'date_range' => array(
 'from' => strtotime( '-18 month' ),
 'to' => time(),
 ),
 'exclude_post_ids' => array(),
 )
  77. jetpack_relatedposts_filter_exclude_post_ids jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms'

    => array(),
 'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  78. jetpack_relatedposts_filter_exclude_post_ids jetpack_relatedposts_filter_args array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms'

    => array(),
 'date_range' => array(),
 'exclude_post_ids' => array(
 1,
 1337,
 ),
 )
  79. array(
 'size' => 3,
 'post_type' => get_post_type(),
 'has_terms' => array(),


    'date_range' => array(),
 'exclude_post_ids' => array(),
 )
  80. array(
 array(
 'term' => array( 'tag.slug' => 'hhvm' )
 ),


    array(
 'not' => array(
 'term' => array( 'post_id' => 1337 )
 )
 ),
 …
 )
  81. array(
 array(
 'term' => array( 'tag.slug' => 'hhvm' )
 ),


    array(
 'not' => array(
 'term' => array( 'post_id' => 1337 )
 )
 ),
 …
 ) jetpack_relatedposts_filter_filters
  82. array(
 array(
 'term' => array( 'tag.slug' => 'hhvm' )
 ),


    array(
 'not' => array(
 'term' => array( 'post_id' => 1337 )
 )
 ),
 …
 ) jetpack_relatedposts_filter_filters developer.wordpress.com/docs/elasticsearch
  83. Manipulating Results

  84. jetpack_relatedposts_filter_hits array(
 array( 'id' => 1337 ),
 array( 'id' =>

    631 ),
 array( 'id' => 1771 ),
 array( 'id' => 20 ),
 array( 'id' => 1491 ),
 )
  85. jetpack_relatedposts_filter_post_context

  86. jetpack_relatedposts_returned_results

  87. jetpack_relatedposts_returned_results [ {
 "id": 1771,
 "url": "http://xyu.io/2013/08/summer/",
 "url_meta": { "origin":

    2361, "position": 0 },
 "title": "Summer!",
 "format": false,
 "excerpt": "The cats of summer…",
 "context": "In 'cat pictures'",
 "img": {
 "src": "http://xyu.io/2013/08/summer.jpg",
 "width": 350, "height": 200
 }
 }, … ]
  88. Related Posts for Power Users • Customize placement with the

    related posts
 shortcode • Change results or look and feel with various
 filters • Go completely wild with the related posts
 raw object IT'S OVER 9000! IT'S OVER 9000!
  89. Using the Related Posts Raw Object $related = Jetpack_RelatedPosts::init_raw()
 ->set_query_name(

    'my_rp' ) // Optional
 ->get_for_post_id(
 $post_id, // For post_id
 5, // Get 5 results
 array( // ES filters
 array(
 'term' => array( 'tag.slug' => 'hhvm' )
 ),
 …
 )
 ) developer.wordpress.com/docs/elasticsearch
  90. Using the Related Posts Raw Object $related = array(
 array(

    'id' => 1337 ),
 array( 'id' => 631 ),
 array( 'id' => 1771 ),
 array( 'id' => 20 ),
 array( 'id' => 1491 ),
 ) developer.wordpress.com/docs/elasticsearch
  91. Related Posts

  92. Thanks! Code Wrangler — Automattic @HypertextRanch me@xyu.io xyu.io xyu 

       