Slide 1

Slide 1 text

WordPress ❤️ Elasticsearch WordCamp Vancouver (July 2014)

Slide 2

Slide 2 text

Xiao Yu Code Wrangler — Automattic @HypertextRanch [email protected] xyu    

Slide 3

Slide 3 text

Photo by Abhimanyu (Manyu) Sabnis, www.shutterfeet.com

Slide 4

Slide 4 text

elasticsearch

Slide 5

Slide 5 text

–Andrew Oliver, InfoWorld “Ultra-hip Elasticsearch”

Slide 6

Slide 6 text

Google Trends January 2005 January 2008 January 2011 January 2014 lighttpd nginx

Slide 7

Slide 7 text

–elasticsearch.org “Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine.”

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Data Store vs. Data Store… Fight! MySQL Elasticsearch Type RDBMS Document Store Storage Model Normalized Denormalized Consistency Strong Eventual Replication Master-Slave Native HA Inserts Fast Slow Query Language SQL Expressive JSON Query Execution Single Threaded Distributed Analytics Simple Advanced

Slide 10

Slide 10 text

! ! 2 Data Stores !

Slide 11

Slide 11 text

! ! 2 Data Stores — more complexity !

Slide 12

Slide 12 text

! 2 Data Stores — more complexity 2 Data Stores — more points of failure !

Slide 13

Slide 13 text

! 2 Data Stores — more complexity 2 Data Stores — more points of failure 2 Data Stores — more cost ! !

Slide 14

Slide 14 text

! Worth it?

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

 ❤️

Slide 17

Slide 17 text

Open & Inclusive  ❤️

Slide 18

Slide 18 text

Allows Us to Tinker & Inclusive  ❤️

Slide 19

Slide 19 text

Allows Us to Tinker & Optimized to Run Anywhere*  ❤️ * Almost

Slide 20

Slide 20 text

Openness & Inclusiveness  ❤️

Slide 21

Slide 21 text

Expressing Ourselves with Written Language  ❤️

Slide 22

Slide 22 text

howdy  ❤️

Slide 23

Slide 23 text

ابحرم  ❤️

Slide 24

Slide 24 text

 ❤️

Slide 25

Slide 25 text

Written Language  ❤️

Slide 26

Slide 26 text

Understands Strings

Slide 27

Slide 27 text

Understands Language

Slide 28

Slide 28 text

–My Blog to a Human “I almost ran into a swarm of baby ducks this morning…”

Slide 29

Slide 29 text

–My Blog to MySQL 0000000 49 20 61 6c 6d 6f 73 74 20 72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…

Slide 30

Slide 30 text

–Some Awesome Human “Man, I wonder what fantastic insights Xiao has about ducks.”

Slide 31

Slide 31 text

SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%ducks%"

Slide 32

Slide 32 text

0000000 49 20 61 6c 6d 6f 73 74 20 72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…

Slide 33

Slide 33 text

0000000 49 20 61 6c 6d 6f 73 74 20 72 61 6e 20 69 6e 74
 0000010 6f 20 61 20 73 77 61 72 6d 20 6f 66 20 62 61 62
 0000020 79 20 64 75 63 6b 73 20 74 68 69 73 20 6d 6f 72
 0000030 6e 69 6e 67 e2 80 a6…

Slide 34

Slide 34 text

“I almost ran into a swarm of baby ducks
 this morning…”

Slide 35

Slide 35 text

“I almost ran into a swarm of baby ducks
 this morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"

Slide 36

Slide 36 text

“I almost ran into a swarm of baby ducks
 this morning…” SELECT *
 FROM wp_posts
 WHERE post_content LIKE "%running%"

Slide 37

Slide 37 text

RUNNING != RAN RUNNING != RAN

Slide 38

Slide 38 text

Understands Language

Slide 39

Slide 39 text

! Analyzing Text

Slide 40

Slide 40 text

Elasticsearch Analyzer Chain Character Filters Raw Text Tokenizer Token Filters Terms

Slide 41

Slide 41 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms “The über-quick brown fox
 jumps over the lazy dogs.”

Slide 42

Slide 42 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms


 The über-quick brown fox
 jumps over the lazy dogs.


Slide 43

Slide 43 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms


 The über-quick brown fox
 jumps over the lazy dogs.


Slide 44

Slide 44 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 The über-quick brown fox
 jumps over the lazy dogs.


Slide 45

Slide 45 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 The über—quick brown fox 
 jumps over the lazy dogs.


Slide 46

Slide 46 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 The über quick brown fox
 jumps over the lazy dogs


Slide 47

Slide 47 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs


Slide 48

Slide 48 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 The
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs


Slide 49

Slide 49 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs


Slide 50

Slide 50 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 über
 brown
 jumps
 the
 dogs


Slide 51

Slide 51 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs


Slide 52

Slide 52 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jumps
 the
 dogs


Slide 53

Slide 53 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog


Slide 54

Slide 54 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 the
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 the
 dog


Slide 55

Slide 55 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog


Slide 56

Slide 56 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 
 quick
 fox
 over
 lazy
 
 uber
 brown
 jump
 
 dog


Slide 57

Slide 57 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms 
 
 quick
 fox vulpes
 over
 lazy
 
 uber
 brown
 jump
 
 dog canis


Slide 58

Slide 58 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms Terms Doc IDs brown 1 canis 1 dog 1 fox 1 jump 1 lazy 1 … over 1 quick 1 uber 1 vulpes 1

Slide 59

Slide 59 text

Elasticsearch Analyzer Chain Raw Text → Character Filters → Tokenizer → Token Filters → Terms Terms Doc IDs brown 1, 3, 6, … canis 1, 2, … dog 1, 2, 12… fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …

Slide 60

Slide 60 text

Elasticsearch Analyzer Chain — On Query Text Raw Text → Character Filters → Tokenizer → Token Filters → Terms “Jumping Foxes”

Slide 61

Slide 61 text

Elasticsearch Analyzer Chain — On Query Text Raw Text → Character Filters → Tokenizer → Token Filters → Terms jump fox vulpes

Slide 62

Slide 62 text

Elasticsearch Analyzer Chain — On Query Text Raw Text → Character Filters → Tokenizer → Token Filters → Terms Terms Doc IDs brown 1, 3, 6, … canis 1, 2, … dog 1, 2, … fox 1, 5, 7, … jump 1, 6, … lazy 1, 7, … … 3, 6, 7, … over 1, 3, 5, 6, … quick 1, 4, … uber 1, … vulpes 1, 5, 7, …

Slide 63

Slide 63 text

Understanding Language Through Analyzers

Slide 64

Slide 64 text

! Primer on Queries

Slide 65

Slide 65 text

Elasticsearch Filters & Queries Filters Queries Speed Fast Slow(er) Cached Yes, With Bitsets! No Matching Boolean Yes/No Relevancy Score

Slide 66

Slide 66 text

Relevancy Score? TF-IDF

Slide 67

Slide 67 text

Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

Slide 68

Slide 68 text

Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

Slide 69

Slide 69 text

Relevancy Score? Term Frequency
 ×
 Inverse Document Frequency

Slide 70

Slide 70 text

Relevancy Score? 
 ×


Slide 71

Slide 71 text

Elasticsearch Querying Best Practice Filter to Reduce Possible Documents then Query to Calculate Match Relevancy

Slide 72

Slide 72 text

Elasticsearch Query Example { "query" : { … } }
 
 
 
 
 
 
 
 
 
 


Slide 73

Slide 73 text

Elasticsearch Query Example { "query" : { "filtered" : { … } } }
 
 
 
 
 
 
 
 
 
 


Slide 74

Slide 74 text

Elasticsearch Query Example { "query" : { "filtered" : {
 "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 


Slide 75

Slide 75 text

Elasticsearch Query Example { "query" : { "filtered" : {
 "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 


Slide 76

Slide 76 text

Elasticsearch Query Example www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filters.html { "query" : { "filtered" : {
 "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 


Slide 77

Slide 77 text

Elasticsearch Query Example { "query" : { "filtered" : {
 "filter" : {
 "terms" : {
 "tag.name" : [
 "wordpress",
 "hhvm"
 ]
 }
 },
 "query" : { … }
 } } }


Slide 78

Slide 78 text

Elasticsearch Query Example www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-queries.html { "query" : { "filtered" : {
 "filter" : { … },
 "query" : { … }
 } } }
 
 
 
 
 
 
 


Slide 79

Slide 79 text

Elasticsearch Query Example { "query" : { "filtered" : {
 "filter" : { … },
 "query" : {
 "multi_match" : {
 "query" : "Speed Benchmarks",
 "fields" : [
 "content", "title^5",
 "taxonomy.*.name"
 ]
 }
 }
 } } }

Slide 80

Slide 80 text

Elasticsearch Query Example { "query" : { "filtered" : {
 "filter" : {
 "terms" : {
 "tag.name" : [ "wordpress", "hhvm" ]
 }
 },
 "query" : {
 "multi_match" : {
 "query" : "Speed Benchmarks",
 "fields" : [
 "title^5", "content", "taxonomy.*.name"
 ]
 }
 }
 } } }

Slide 81

Slide 81 text

Elasticsearch Advanced Queries • Nesting of queries with boolean logic • Geo matching • Handling common terms (“to be or not to be”) • Matching with edits (misspelled words) • Multi stage scoring with query rescore • Function scoring

Slide 82

Slide 82 text

Elasticsearch Advanced Queries

Slide 83

Slide 83 text

! Getting Started

Slide 84

Slide 84 text

elasticsearch

Slide 85

Slide 85 text

• Java package:
 elasticsearch.org/download • Runs on most servers with minimal configuration • Interact via native PHP clients or ES JSON REST API • Completely own your data • Access to full set of filters, queries, and aggregations • Manage your own indexing and updates to keep posts in sync • WordPress plugin:
 jetpack.me • Wrapper for RESTful API calls to Automattic infrastructure • Interact via WordPress.com JSON REST API • Public posts synced • Limited to safe queries & filters; no aggregations* • Automatic syncing of posts; can manually trigger bulk sync * Does not apply for VIP ES add-on

Slide 86

Slide 86 text

 MySQL Schemas Elasticsearch Mappings

Slide 87

Slide 87 text

 MySQL Schemas Elasticsearch Mappings developer.wordpress.com/docs/elasticsearch/post-doc-schema

Slide 88

Slide 88 text

WordPress.com Elasticsearch Mappings • Dynamic mappings for taxonomies and post_meta • Intelligent handling of post tags, categories, taxonomies, and post_meta values • Tokenization & indexing dates; optimized for multidimensional date queries • Extraction & indexing of meaningful objects within post content developer.wordpress.com/docs/elasticsearch/post-doc-schema

Slide 89

Slide 89 text

Open Sourced: WordPress-Elasticsearch Lib • Index Builders — Helps create ES indices with tried and true mappings • Analyzer Builders — Helps create multi-lingual analyzer chains with linguistic best practices • Document Builders & Iterations — Tools to create ES documents and to help bulk indexing operations github.com/automattic/wpes-lib

Slide 90

Slide 90 text

 ❤️

Slide 91

Slide 91 text

Thanks! Code Wrangler — Automattic @HypertextRanch [email protected] xyu    