Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jetpack Related Posts for Power Users

xyu
August 16, 2014

Jetpack Related Posts for Power Users

The related posts module in Jetpack looks simple at first glance however underneath it’s powered by Elasticsearch, an advanced natural language search engine. Come as we peel back the covers to explain how Jetpack uses Elasticsearch to determine what’s related and more importantly how to customize it to take full advantage of Elasticsearch’s textual analytics abilities.

xyu

August 16, 2014
Tweet

More Decks by xyu

Other Decks in Technology

Transcript

  1. Jetpack Related Posts for Power Users
    WordCamp Maine (August 2014)

    View Slide

  2. The 13 filters you won't believe is in related posts
    WordCamp Maine (August 2014)

    View Slide

  3. Xiao Yu
    Code Wrangler — Automattic

    @HypertextRanch

    [email protected]

    xyu.io

    xyu





    View Slide

  4. Related Posts

    View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. Because finding related content is a hard
    problem to solve.

    View Slide

  10. View Slide

  11. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on NGINX + HHVM It’s been a
    year since I last made any major changes to my WordPress on Heroku build and in tech years
    that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built
    in. Much progress have also been made both HHVM and WordPress to make both compatible
    with each other. So it seems like now is as good a time as any to update the stack this site is
    running on. So without further ado I like to introduce: Heroku WP — A template for HHVM
    powered WordPress served by nginx. The Goal There are numerous other templates out there for
    running WordPress on Heroku and my main goals for this templates are: It should be simple —
    use the default buildpack provided by Heroku so there’s no other 3rd party dependency to
    implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze
    every last ounce of performance out of each Heroku Dyno. It should be secure — security is not
    an add-on, admin pages should be secure by default and database connections needs to be
    encrypted. It should scale — just because we can serve millions of page hits a day off a single
    Heroku Dyno does not mean we’ll stop there. The template should be made with cloud
    architecture in mind so that the number of Dynos can scale up and down without breaking. The
    Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get
    WordPress running on: NGINX — An event driven web server that was engineered for the modern
    day to replace Apache. This high performance web server is preferred by more top 1,000 sites
    then any other and it’s what’s used by the largest WordPress install out there, WordPress.com.
    HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run
    PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to
    run any statical analysis on performance however antidotally it feels a lot faster navigating WP
    admin and page generation times looks much better. I’m looking forward to running more tests
    and performance tuning this build in the coming weeks. Update: While still not a head-to-head
    test looking at the response times as reported by StatusCake for this site running on Heroku-WP
    and a mirror of this site that is running on the old Heroku LAMP stack with no load other then
    StatusCake pings shows a dramatic improvement:

    View Slide

  12. SELECT COUNT(*)

    FROM wp_posts

    WHERE post_content LIKE "%WordPress%"
    SELECT COUNT(*)

    FROM wp_posts

    WHERE post_content LIKE "%on%"
    SELECT COUNT(*)

    FROM wp_posts

    WHERE post_content LIKE "%NGINX%"

    SELECT COUNT(*)

    FROM wp_posts

    WHERE post_content LIKE "%improvement%"

    View Slide

  13. NOPE
    NOPE

    View Slide

  14. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on NGINX + HHVM It’s been a
    year since I last made any major changes to my WordPress on Heroku build and in tech years
    that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built
    in. Much progress have also been made both HHVM and WordPress to make both compatible
    with each other. So it seems like now is as good a time as any to update the stack this site is
    running on. So without further ado I like to introduce: Heroku WP — A template for HHVM
    powered WordPress served by nginx. The Goal There are numerous other templates out there for
    running WordPress on Heroku and my main goals for this templates are: It should be simple —
    use the default buildpack provided by Heroku so there’s no other 3rd party dependency to
    implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze
    every last ounce of performance out of each Heroku Dyno. It should be secure — security is not
    an add-on, admin pages should be secure by default and database connections needs to be
    encrypted. It should scale — just because we can serve millions of page hits a day off a single
    Heroku Dyno does not mean we’ll stop there. The template should be made with cloud
    architecture in mind so that the number of Dynos can scale up and down without breaking. The
    Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get
    WordPress running on: NGINX — An event driven web server that was engineered for the modern
    day to replace Apache. This high performance web server is preferred by more top 1,000 sites
    then any other and it’s what’s used by the largest WordPress install out there, WordPress.com.
    HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run
    PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to
    run any statical analysis on performance however antidotally it feels a lot faster navigating WP
    admin and page generation times looks much better. I’m looking forward to running more tests
    and performance tuning this build in the coming weeks. Update: While still not a head-to-head
    test looking at the response times as reported by StatusCake for this site running on Heroku-WP
    and a mirror of this site that is running on the old Heroku LAMP stack with no load other then
    StatusCake pings shows a dramatic improvement:

    View Slide

  15. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on NGINX + HHVM It’s been a
    year since I last made any major changes to my WordPress on Heroku build and in tech years
    that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built
    in. Much progress have also been made both HHVM and WordPress to make both compatible
    with each other. So it seems like now is as good a time as any to update the stack this site is
    running on. So without further ado I like to introduce: Heroku WP — A template for HHVM
    powered WordPress served by nginx. The Goal There are numerous other templates out there for
    running WordPress on Heroku and my main goals for this templates are: It should be simple —
    use the default buildpack provided by Heroku so there’s no other 3rd party dependency to
    implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze
    every last ounce of performance out of each Heroku Dyno. It should be secure — security is not
    an add-on, admin pages should be secure by default and database connections needs to be
    encrypted. It should scale — just because we can serve millions of page hits a day off a single
    Heroku Dyno does not mean we’ll stop there. The template should be made with cloud
    architecture in mind so that the number of Dynos can scale up and down without breaking. The
    Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get
    WordPress running on: NGINX — An event driven web server that was engineered for the modern
    day to replace Apache. This high performance web server is preferred by more top 1,000 sites
    then any other and it’s what’s used by the largest WordPress install out there, WordPress.com.
    HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run
    PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to
    run any statical analysis on performance however antidotally it feels a lot faster navigating WP
    admin and page generation times looks much better. I’m looking forward to running more tests
    and performance tuning this build in the coming weeks. Update: While still not a head-to-head
    test looking at the response times as reported by StatusCake for this site running on Heroku-WP
    and a mirror of this site that is running on the old Heroku LAMP stack with no load other then
    StatusCake pings shows a dramatic improvement:

    View Slide

  16. SELECT *

    FROM wp_posts

    WHERE

    post_content LIKE "%WordPress%" OR

    post_content LIKE "%NGINX%" OR

    post_content LIKE "%HHVM%" OR

    post_content LIKE "%Heroku%" OR

    post_content LIKE "%performance%"


    View Slide

  17. SELECT *

    FROM wp_posts

    WHERE

    post_content LIKE "%WordPress%" OR

    post_content LIKE "%NGINX%" OR

    post_content LIKE "%HHVM%" OR

    post_content LIKE "%Heroku%" OR

    post_content LIKE "%performance%"

    ORDER BY

    !?

    View Slide

  18. View Slide

  19. –Some Plugin Author
    “Let’s just use tags and categories.”

    View Slide

  20. –Some SEO Consultant’s Dream
    “Everyone tags & categorize everything
    perfectly.”

    View Slide

  21. NOPE
    NOPE

    View Slide

  22. Ok, it’s a hard problem to solve.

    View Slide

  23. WE CAN FIX IT
    WE CAN FIX IT
    WITH SCIENCE!
    WITH SCIENCE!

    View Slide

  24. !
    elasticsearch

    View Slide

  25. Understands
    Language

    View Slide

  26. !
    Analyzing Text

    View Slide

  27. Elasticsearch Analyzer Chain
    Character Filters
    Raw Text
    Tokenizer
    Token Filters
    Terms

    View Slide

  28. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms
    “The über-quick brown fox

    jumps over the lazy dogs.”

    View Slide

  29. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    The über-quick brown fox

    jumps over the lazy dogs.


    View Slide

  30. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    The über-quick brown fox

    jumps over the lazy dogs.


    View Slide

  31. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    The über-quick brown fox

    jumps over the lazy dogs.


    View Slide

  32. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    The über—quick brown fox 

    jumps over the lazy dogs.


    View Slide

  33. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    The über quick brown fox

    jumps over the lazy dogs


    View Slide

  34. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    The

    quick

    fox

    over

    lazy


    über

    brown

    jumps

    the

    dogs


    View Slide

  35. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    The

    quick

    fox

    over

    lazy


    über

    brown

    jumps

    the

    dogs


    View Slide

  36. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    the

    quick

    fox

    over

    lazy


    über

    brown

    jumps

    the

    dogs


    View Slide

  37. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    the

    quick

    fox

    over

    lazy


    über

    brown

    jumps

    the

    dogs


    View Slide

  38. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    the

    quick

    fox

    over

    lazy


    uber

    brown

    jumps

    the

    dogs


    View Slide

  39. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    the

    quick

    fox

    over

    lazy


    uber

    brown

    jumps

    the

    dogs


    View Slide

  40. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    the

    quick

    fox

    over

    lazy


    uber

    brown

    jump

    the

    dog


    View Slide

  41. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms

    the

    quick

    fox

    over

    lazy


    uber

    brown

    jump

    the

    dog


    View Slide

  42. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms


    quick

    fox

    over

    lazy


    uber

    brown

    jump


    dog


    View Slide

  43. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms
    Terms Doc IDs
    brown 1
    dog 1
    fox 1
    jump 1
    lazy 1

    over 1
    quick 1
    uber 1

    View Slide

  44. Elasticsearch Analyzer Chain
    Raw Text → Character Filters → Tokenizer → Token Filters → Terms
    Terms Doc IDs
    brown 1, 3, 6, …
    dog 1, 2, 12…
    fox 1, 5, 7, …
    jump 1, 6, …
    lazy 1, 7, …
    … 3, 6, 7, …
    over 1, 3, 5, 6, …
    quick 1, 4, …
    uber 1, …

    View Slide

  45. Understands
    Language
    Through Analyzers

    View Slide

  46. Understands
    Relevancy

    View Slide

  47. Relevancy Score?
    TF-IDF

    View Slide

  48. Relevancy Score?
    Term Frequency

    ×

    Inverse Document Frequency

    View Slide

  49. Relevancy Score?
    Term Frequency

    ×

    Inverse Document Frequency

    View Slide

  50. Relevancy Score?
    Term Frequency

    ×

    Inverse Document Frequency

    View Slide

  51. Relevancy Score?
    Term Frequency

    ×

    Inverse Document Frequency

    View Slide

  52. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on NGINX + HHVM It’s been a
    year since I last made any major changes to my WordPress on Heroku build and in tech years
    that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built
    in. Much progress have also been made both HHVM and WordPress to make both compatible
    with each other. So it seems like now is as good a time as any to update the stack this site is
    running on. So without further ado I like to introduce: Heroku WP — A template for HHVM
    powered WordPress served by nginx. The Goal There are numerous other templates out there for
    running WordPress on Heroku and my main goals for this templates are: It should be simple —
    use the default buildpack provided by Heroku so there’s no other 3rd party dependency to
    implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze
    every last ounce of performance out of each Heroku Dyno. It should be secure — security is not
    an add-on, admin pages should be secure by default and database connections needs to be
    encrypted. It should scale — just because we can serve millions of page hits a day off a single
    Heroku Dyno does not mean we’ll stop there. The template should be made with cloud
    architecture in mind so that the number of Dynos can scale up and down without breaking. The
    Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get
    WordPress running on: NGINX — An event driven web server that was engineered for the modern
    day to replace Apache. This high performance web server is preferred by more top 1,000 sites
    then any other and it’s what’s used by the largest WordPress install out there, WordPress.com.
    HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run
    PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to
    run any statical analysis on performance however antidotally it feels a lot faster navigating WP
    admin and page generation times looks much better. I’m looking forward to running more tests
    and performance tuning this build in the coming weeks. Update: While still not a head-to-head
    test looking at the response times as reported by StatusCake for this site running on Heroku-WP
    and a mirror of this site that is running on the old Heroku LAMP stack with no load other then
    StatusCake pings shows a dramatic improvement:

    View Slide

  53. WordPress on NGINX + HHVM with Heroku Buildpacks WordPress on NGINX + HHVM It’s been a
    year since I last made any major changes to my WordPress on Heroku build and in tech years
    that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built
    in. Much progress have also been made both HHVM and WordPress to make both compatible
    with each other. So it seems like now is as good a time as any to update the stack this site is
    running on. So without further ado I like to introduce: Heroku WP — A template for HHVM
    powered WordPress served by nginx. The Goal There are numerous other templates out there for
    running WordPress on Heroku and my main goals for this templates are: It should be simple —
    use the default buildpack provided by Heroku so there’s no other 3rd party dependency to
    implicitly trust or to maintain. It should be fast — use the latest technologies available to squeeze
    every last ounce of performance out of each Heroku Dyno. It should be secure — security is not
    an add-on, admin pages should be secure by default and database connections needs to be
    encrypted. It should scale — just because we can serve millions of page hits a day off a single
    Heroku Dyno does not mean we’ll stop there. The template should be made with cloud
    architecture in mind so that the number of Dynos can scale up and down without breaking. The
    Stack Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get
    WordPress running on: NGINX — An event driven web server that was engineered for the modern
    day to replace Apache. This high performance web server is preferred by more top 1,000 sites
    then any other and it’s what’s used by the largest WordPress install out there, WordPress.com.
    HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run
    PHP scripts which when tested with WordPress showed up to a 2x improvement. I have yet to
    run any statical analysis on performance however antidotally it feels a lot faster navigating WP
    admin and page generation times looks much better. I’m looking forward to running more tests
    and performance tuning this build in the coming weeks. Update: While still not a head-to-head
    test looking at the response times as reported by StatusCake for this site running on Heroku-WP
    and a mirror of this site that is running on the old Heroku LAMP stack with no load other then
    StatusCake pings shows a dramatic improvement:

    View Slide

  54. View Slide

  55. View Slide


  56. View Slide



  57. View Slide

  58. View Slide

  59. Related Posts

    View Slide

  60. View Slide

  61. Related Posts for
    Power Users
    • Customize placement
    with the related posts

    shortcode
    • Change results or look
    and feel with various

    filters
    • Go completely wild
    with the related posts

    raw object
    IT'S OVER 9000!
    IT'S OVER 9000!

    View Slide

  62. [jetpack-related-posts]

    View Slide

  63. echo do_shortcode(

    '[jetpack-related-posts]'

    );

    ?>

    View Slide

  64. Related Posts for
    Power Users
    • Customize placement
    with the related posts

    shortcode
    • Change results or look
    and feel with various

    filters
    • Go completely wild
    with the related posts

    raw object
    IT'S OVER 9000!
    IT'S OVER 9000!

    View Slide

  65. Look & Feel Configs

    View Slide

  66. jetpack_relatedposts_filter_options

    View Slide

  67. jetpack_relatedposts_filter_headline

    View Slide

  68. jetpack_relatedposts_filter_thumbnail_size

    View Slide

  69. Fine Tune Matching Posts

    View Slide

  70. jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  71. jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  72. jetpack_relatedposts_filter_post_type
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  73. jetpack_relatedposts_filter_post_type
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => array(

    'post',

    'awesome_sauce',

    ),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  74. jetpack_relatedposts_filter_has_terms
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  75. jetpack_relatedposts_filter_has_terms
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(

    get_term_by( 'slug', 'devops', 'category' ),

    get_term_by( 'slug', 'hhvm', 'post_tag' ),

    ),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  76. jetpack_relatedposts_filter_date_range
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  77. jetpack_relatedposts_filter_date_range
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(

    'from' => strtotime( '-18 month' ),

    'to' => time(),

    ),

    'exclude_post_ids' => array(),

    )

    View Slide

  78. jetpack_relatedposts_filter_exclude_post_ids
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  79. jetpack_relatedposts_filter_exclude_post_ids
    jetpack_relatedposts_filter_args
    array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(

    1,

    1337,

    ),

    )

    View Slide

  80. array(

    'size' => 3,

    'post_type' => get_post_type(),

    'has_terms' => array(),

    'date_range' => array(),

    'exclude_post_ids' => array(),

    )

    View Slide

  81. array(

    array(

    'term' => array( 'tag.slug' => 'hhvm' )

    ),

    array(

    'not' => array(

    'term' => array( 'post_id' => 1337 )

    )

    ),

    …

    )

    View Slide

  82. array(

    array(

    'term' => array( 'tag.slug' => 'hhvm' )

    ),

    array(

    'not' => array(

    'term' => array( 'post_id' => 1337 )

    )

    ),

    …

    )
    jetpack_relatedposts_filter_filters

    View Slide

  83. array(

    array(

    'term' => array( 'tag.slug' => 'hhvm' )

    ),

    array(

    'not' => array(

    'term' => array( 'post_id' => 1337 )

    )

    ),

    …

    )
    jetpack_relatedposts_filter_filters
    developer.wordpress.com/docs/elasticsearch

    View Slide

  84. Manipulating Results

    View Slide

  85. jetpack_relatedposts_filter_hits
    array(

    array( 'id' => 1337 ),

    array( 'id' => 631 ),

    array( 'id' => 1771 ),

    array( 'id' => 20 ),

    array( 'id' => 1491 ),

    )

    View Slide

  86. jetpack_relatedposts_filter_post_context

    View Slide

  87. jetpack_relatedposts_returned_results

    View Slide

  88. jetpack_relatedposts_returned_results
    [ {

    "id": 1771,

    "url": "http://xyu.io/2013/08/summer/",

    "url_meta": { "origin": 2361, "position": 0 },

    "title": "Summer!",

    "format": false,

    "excerpt": "The cats of summer…",

    "context": "In 'cat pictures'",

    "img": {

    "src": "http://xyu.io/2013/08/summer.jpg",

    "width": 350, "height": 200

    }

    }, … ]

    View Slide

  89. Related Posts for
    Power Users
    • Customize placement
    with the related posts

    shortcode
    • Change results or look
    and feel with various

    filters
    • Go completely wild
    with the related posts

    raw object
    IT'S OVER 9000!
    IT'S OVER 9000!

    View Slide

  90. Using the Related Posts Raw Object
    $related = Jetpack_RelatedPosts::init_raw()

    ->set_query_name( 'my_rp' ) // Optional

    ->get_for_post_id(

    $post_id, // For post_id

    5, // Get 5 results

    array( // ES filters

    array(

    'term' => array( 'tag.slug' => 'hhvm' )

    ),

    …

    )

    )
    developer.wordpress.com/docs/elasticsearch

    View Slide

  91. Using the Related Posts Raw Object
    $related = array(

    array( 'id' => 1337 ),

    array( 'id' => 631 ),

    array( 'id' => 1771 ),

    array( 'id' => 20 ),

    array( 'id' => 1491 ),

    )
    developer.wordpress.com/docs/elasticsearch

    View Slide

  92. Related Posts

    View Slide

  93. Thanks!
    Code Wrangler — Automattic

    @HypertextRanch

    [email protected]

    xyu.io

    xyu





    View Slide