Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WP ❤️ Elasticsearch

xyu
July 26, 2014

WP ❤️ Elasticsearch

Searching in WordPress is not great; MySQL is a relational datastore but most of our WP data consists of written language and thus hard to search. Elasticsearch allows us to bridge this gap and finally give WP the search it deserves.

xyu

July 26, 2014
Tweet

More Decks by xyu

Other Decks in Technology

Transcript

  1. Data Store vs. Data Store… Fight! MySQL Elasticsearch Type RDBMS

    Document Store Storage Model Normalized Denormalized Consistency Strong Eventual Replication Master-Slave Native HA Inserts Fast Slow Query Language SQL Expressive JSON Query Execution Single Threaded Distributed Analytics Simple Advanced
  2. ! 2 Data Stores — more complexity 2 Data Stores

    — more points of failure 2 Data Stores — more cost ! !
  3. –My Blog to a Human “I almost ran into a

    swarm of baby ducks this morning…”
  4. –My Blog to MySQL 0000000 49 20 61 6c 6d

    6f 73 74 20 72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…
  5. 0000000 49 20 61 6c 6d 6f 73 74 20

    72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…
  6. 0000000 49 20 61 6c 6d 6f 73 74 20

    72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…
  7. “I almost ran into a swarm of baby ducks
 this

    morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"
  8. “I almost ran into a swarm of baby ducks
 this

    morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"
  9. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms “The über-quick brown fox
 jumps over the lazy dogs.”
  10. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  11. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms <p>
 The &uuml;ber-quick brown fox
 jumps over the lazy dogs.
 </p>
  12. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über-quick brown fox
 jumps over the lazy dogs.

  13. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über—quick brown fox 
 jumps over the lazy dogs.

  14. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The über quick brown fox
 jumps over the lazy dogs

  15. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  16. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  17. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  18. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs

  19. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  20. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs

  21. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  22. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog

  23. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog

  24. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog

  25. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms 
 
 quick
 fox vulpes
 over
 lazy
 
 uber
 brown
 jump
 
 dog canis

  26. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms Terms Doc IDs brown 1 canis 1 dog 1 fox 1 jump 1 lazy 1 … over 1 quick 1 uber 1 vulpes 1
  27. Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer

    → Token Filters → Terms Terms Doc IDs brown 1, 3, 6, … canis 1, 2, … dog 1, 2, 12… fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …
  28. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms “Jumping Foxes”
  29. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms jump fox vulpes
  30. Elasticsearch Analyzer Chain — On Query Text Raw Text →

    Character Filters → Tokenizer → Token Filters → Terms Terms Doc IDs brown 1, 3, 6, … canis 1, 2, … dog 1, 2, … fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …
  31. Elasticsearch Filters & Queries Filters Queries Speed Fast Slow(er) Cached

    Yes, With Bitsets! No Matching Boolean Yes/No Relevancy Score
  32. Elasticsearch Query Example { "query" : { … } }


    
 
 
 
 
 
 
 
 
 

  33. Elasticsearch Query Example { "query" : { "filtered" : {

    … } } }
 
 
 
 
 
 
 
 
 
 

  34. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 

  35. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 

  36. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : {
 "terms" : {
 "tag.name" : [
 "wordpress",
 "hhvm"
 ]
 }
 },
 "query" : { … }
 } } }

  37. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : { … },
 "query" : {
 "multi_match" : {
 "query" : "Speed Benchmarks",
 "fields" : [
 "content", "title^5",
 "taxonomy.*.name"
 ]
 }
 }
 } } }
  38. Elasticsearch Query Example { "query" : { "filtered" : {


    "filter" : {
 "terms" : {
 "tag.name" : [ "wordpress", "hhvm" ]
 }
 },
 "query" : {
 "multi_match" : {
 "query" : "Speed Benchmarks",
 "fields" : [
 "title^5", "content", "taxonomy.*.name"
 ]
 }
 }
 } } }
  39. Elasticsearch Advanced Queries • Nesting of queries with boolean logic

    • Geo matching • Handling common terms (“to be or not to be”) • Matching with edits (misspelled words) • Multi stage scoring with query rescore • Function scoring
  40. • Java package:
 elasticsearch.org/download • Runs on most servers with

    minimal configuration • Interact via native PHP clients or ES JSON REST API • Completely own your data • Access to full set of filters, queries, and aggregations • Manage your own indexing and updates to keep posts in sync • WordPress plugin:
 jetpack.me • Wrapper for RESTful API calls to Automattic infrastructure • Interact via WordPress.com JSON REST API • Public posts synced • Limited to safe queries & filters; no aggregations* • Automatic syncing of posts; can manually trigger bulk sync * Does not apply for VIP ES add-on
  41. WordPress.com Elasticsearch Mappings • Dynamic mappings for taxonomies and post_meta

    • Intelligent handling of post tags, categories, taxonomies, and post_meta values • Tokenization & indexing dates; optimized for multidimensional date queries • Extraction & indexing of meaningful objects within post content developer.wordpress.com/docs/elasticsearch/post-doc-schema
  42. Open Sourced: WordPress-Elasticsearch Lib • Index Builders — Helps create

    ES indices with tried and true mappings • Analyzer Builders — Helps create multi-lingual analyzer chains with linguistic best practices • Document Builders & Iterations — Tools to create ES documents and to help bulk indexing operations github.com/automattic/wpes-lib