Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WP ❤️ Elasticsearch

25e2ecf9b520e06d71e47ab083924300?s=47 xyu
July 26, 2014

WP ❤️ Elasticsearch

Searching in WordPress is not great; MySQL is a relational datastore but most of our WP data consists of written language and thus hard to search. Elasticsearch allows us to bridge this gap and finally give WP the search it deserves.

25e2ecf9b520e06d71e47ab083924300?s=128

xyu

July 26, 2014
Tweet

Transcript

  1. WordPress ❤️ Elasticsearch WordCamp Vancouver (July 2014)

  2. Xiao Yu Code Wrangler — Automattic @HypertextRanch me@xyu.io xyu 

      
  3. Photo by Abhimanyu (Manyu) Sabnis, www.shutterfeet.com

  4. elasticsearch

  5. –Andrew Oliver, InfoWorld “Ultra-hip Elasticsearch”

  6. Google Trends January 2005 January 2008 January 2011 January 2014

    lighttpd nginx
  7. –elasticsearch.org “Elasticsearch is a flexible and powerful open source, distributed,

    real-time search and analytics engine.”
  8. None
  9. Data Store vs. Data Store… Fight! MySQL Elasticsearch Type RDBMS

    Document Store Storage Model Normalized Denormalized Consistency Strong Eventual Replication Master-Slave Native HA Inserts Fast Slow Query Language SQL Expressive JSON Query Execution Single Threaded Distributed Analytics Simple Advanced
  10. ! ! 2 Data Stores !

  11. ! ! 2 Data Stores — more complexity !

  12. ! 2 Data Stores — more complexity 2 Data Stores

    — more points of failure !
  13. ! 2 Data Stores — more complexity 2 Data Stores

    — more points of failure 2 Data Stores — more cost ! !
  14. ! Worth it?

  15. None
  16.  ❤️

  17. Open & Inclusive  ❤️

  18. Allows Us to Tinker & Inclusive  ❤️

  19. Allows Us to Tinker & Optimized to Run Anywhere* 

    ❤️ * Almost
  20. Openness & Inclusiveness  ❤️

  21. Expressing Ourselves with Written Language  ❤️

  22. howdy  ❤️

  23. ابحرم  ❤️

  24.  ❤️

  25. Written Language  ❤️

  26. Understands Strings

  27. Understands Language

  28. –My Blog to a Human “I almost ran into a

    swarm of baby ducks this morning…”
  29. –My Blog to MySQL 0000000 49 20 61 6c 6d

    6f 73 74 20 72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…
  30. –Some Awesome Human “Man, I wonder what fantastic insights Xiao

    has about ducks.”
  31. SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%ducks%"

  32. 0000000 49 20 61 6c 6d 6f 73 74 20

    72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…
  33. 0000000 49 20 61 6c 6d 6f 73 74 20

    72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…
  34. “I almost ran into a swarm of baby ducks
 this

    morning…”
  35. “I almost ran into a swarm of baby ducks
 this

    morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"
  36. “I almost ran into a swarm of baby ducks
 this

    morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"
  37. RUNNING != RAN RUNNING != RAN

  38. Understands Language

  39. ! Analyzing Text

  40. Elasticsearch Analyzer Chain Character Filters Raw Text Tokenizer Token Filters

    Terms
  41. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms “The über-quick brown fox
 jumps over the lazy dogs.”
  42. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  43. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  44. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über-quick brown fox
 jumps over the lazy dogs.

  45. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über—quick brown fox 
 jumps over the lazy dogs.

  46. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über quick brown fox
 jumps over the lazy dogs

  47. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  48. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  49. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  50. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  51. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  52. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  53. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  54. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  55. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog

  56. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog

  57. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox vulpes
 over
 lazy
 
 uber
 brown
 jump
 
 dog canis

  58. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms Terms Doc IDs brown 1 canis 1 dog 1 fox 1 jump 1 lazy 1 … over 1 quick 1 uber 1 vulpes 1
  59. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms Terms Doc IDs brown 1, 3, 6, … canis 1, 2, … dog 1, 2, 12… fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …
  60. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms “Jumping Foxes”
  61. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms jump fox vulpes
  62. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms Terms Doc IDs brown 1, 3, 6, … canis 1, 2, … dog 1, 2, … fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …
  63. Understanding Language Through Analyzers

  64. ! Primer on Queries

  65. Elasticsearch Filters & Queries Filters Queries Speed Fast Slow(er) Cached

    Yes, With Bitsets! No Matching Boolean Yes/No Relevancy Score
  66. Relevancy Score? TF-IDF

  67. Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

  68. Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

  69. Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

  70. Relevancy Score? 
 ×


  71. Elasticsearch Querying Best Practice Filter to Reduce Possible Documents then

    Query to Calculate Match Relevancy
  72. Elasticsearch Query Example { "query" : { … } }


    
 
 
 
 
 
 
 
 
 

  73. Elasticsearch Query Example { "query" : { "filtered" : {

    … } } }
 
 
 
 
 
 
 
 
 
 

  74. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 

  75. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 

  76. Elasticsearch Query Example www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filters.html { "query" : { "filtered" :

    {
 "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 

  77. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : {
 "terms" : {
 "tag.name" : [
 "wordpress",
 "hhvm"
 ]
 }
 },
 "query" : { … }
 } } }

  78. Elasticsearch Query Example www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-queries.html { "query" : { "filtered" :

    {
 "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 

  79. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : { … },
 "query" : {
 "multi_match" : {
 "query" : "Speed Benchmarks",
 "fields" : [
 "content", "title^5",
 "taxonomy.*.name"
 ]
 }
 }
 } } }
  80. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : {
 "terms" : {
 "tag.name" : [ "wordpress", "hhvm" ]
 }
 },
 "query" : {
 "multi_match" : {
 "query" : "Speed Benchmarks",
 "fields" : [
 "title^5", "content", "taxonomy.*.name"
 ]
 }
 }
 } } }
  81. Elasticsearch Advanced Queries • Nesting of queries with boolean logic

    • Geo matching • Handling common terms (“to be or not to be”) • Matching with edits (misspelled words) • Multi stage scoring with query rescore • Function scoring
  82. Elasticsearch Advanced Queries

  83. ! Getting Started

  84. elasticsearch

  85. • Java package:
 elasticsearch.org/download • Runs on most servers with

    minimal configuration • Interact via native PHP clients or ES JSON REST API • Completely own your data • Access to full set of filters, queries, and aggregations • Manage your own indexing and updates to keep posts in sync • WordPress plugin:
 jetpack.me • Wrapper for RESTful API calls to Automattic infrastructure • Interact via WordPress.com JSON REST API • Public posts synced • Limited to safe queries & filters; no aggregations* • Automatic syncing of posts; can manually trigger bulk sync * Does not apply for VIP ES add-on
  86.  MySQL Schemas Elasticsearch Mappings

  87.  MySQL Schemas Elasticsearch Mappings developer.wordpress.com/docs/elasticsearch/post-doc-schema

  88. WordPress.com Elasticsearch Mappings • Dynamic mappings for taxonomies and post_meta

    • Intelligent handling of post tags, categories, taxonomies, and post_meta values • Tokenization & indexing dates; optimized for multidimensional date queries • Extraction & indexing of meaningful objects within post content developer.wordpress.com/docs/elasticsearch/post-doc-schema
  89. Open Sourced: WordPress-Elasticsearch Lib • Index Builders — Helps create

    ES indices with tried and true mappings • Analyzer Builders — Helps create multi-lingual analyzer chains with linguistic best practices • Document Builders & Iterations — Tools to create ES documents and to help bulk indexing operations github.com/automattic/wpes-lib
  90.  ❤️

  91. Thanks! Code Wrangler — Automattic @HypertextRanch me@xyu.io xyu  

     