Slide 1

Slide 1 text

2 Shaunak Kashyap Developer at Elastic @shaunak Elasticsearch for SQL users

Slide 2

Slide 2 text

The Elastic Stack 3 Store, Index & Analyze Ingest User Interface Plugins Hosted Service

Slide 3

Slide 3 text

4 Agenda Search queries Data modeling Architecture 1 2 3

Slide 4

Slide 4 text

2 5 Agenda Search queries Data modeling Architecture 1 3

Slide 5

Slide 5 text

6 Agenda Search queries Data modeling 1 2 3 Architecture

Slide 6

Slide 6 text

7 Search Queries https://www.flickr.com/photos/samhames/4422128094

Slide 7

Slide 7 text

8 CREATE TABLE IF NOT EXISTS emails ( sender VARCHAR(255) NOT NULL, recipients TEXT, cc TEXT, bcc TEXT, subject VARCHAR(1024), body MEDIUMTEXT, datetime DATETIME ); CREATE INDEX emails_sender ON emails(sender); CREATE FULLTEXT INDEX emails_subject ON emails(subject); CREATE FULLTEXT INDEX emails_body ON emails(body); curl -XPUT 'http://localhost:9200/enron' -d' { "mappings": { "email": { "properties": { "sender": { "type": "keyword" }, "recipients": { "type": "keyword" }, "cc": { "type": "keyword" }, "bcc": { "type": "keyword" }, "subject": { "type": "text", "analyzer": "english" }, "body": { "type": "text", "analyzer": "english" } } } } Schemas

Slide 8

Slide 8 text

9 Loading the data

Slide 9

Slide 9 text

10 [LIVE DEMO] • Search for text in a single field • Search for text in multiple fields • Search for a phrase https://github.com/ycombinator/es-enron

Slide 10

Slide 10 text

11 Other Search Features Stemming Synonyms Did you mean? • Jump, jumped, jumping • Queen, monarch • Monetery => Monetary

Slide 11

Slide 11 text

12 Data Modeling https://www.flickr.com/photos/samhames/4422128094 https://www.flickr.com/photos/ericparker/7854157310

Slide 12

Slide 12 text

13 To analyze (text) or not to analyze (keyword)? PUT cities/city/1 { "city": "Omaha", "population": 434353 } PUT cities/city/2 { "city": "New Albany", "population": 8829 } PUT cities/city/3 { "city": "New York", "population": 8406000 } POST cities/_search { "query": { "match": { "city": "New Albany" } } } QUERY + = ?

Slide 13

Slide 13 text

14 To analyze (text) or not to analyze (keyword)? PUT cities/city/1 { "city": "Omaha", "population": 434353 } PUT cities/city/2 { "city": "New Albany", "population": 8829 } PUT cities/city/3 { "city": "New York", "population": 8406000 } Term Document IDs albany 2 new 2,3 omaha 1 york 3

Slide 14

Slide 14 text

15 To analyze (text) or not to analyze (keyword)? PUT cities { "mappings": { "city": { "properties": { "city": { "type": "keyword" } } } } } MAPPING Term Document IDs New Albany 2 New York 3 Omaha 1

Slide 15

Slide 15 text

PUT blog/post/1 { "author_id": 1, "title": "...", "body": "..." } PUT blog/post/2 { "author_id": 1, "title": "...", "body": "..." } PUT blog/post/3 { "author_id": 1, "title": "...", "body": "..." } 16 Relationships: Application-side joins PUT blog/author/1 { "name": "John Doe", "bio": "..." } POST blog/author/_search { "query": { "match": { "name": "John" } } } QUERY 1 POST blog/post/_search { "query": { "match": { "author_id": } } } QUERY 2

Slide 16

Slide 16 text

PUT blog/post/1 { "author_name": "John Doe", "title": "...", "body": "..." } PUT blog/post/2 { "author_name": "John Doe", "title": "...", "body": "..." } 17 Relationships: Data denormalization POST blog/post/_search { "query": { "match": { "author_name": "John" } } } QUERY PUT blog/post/3 { "author_name": "John Doe", "title": "...", "body": "..." }

Slide 17

Slide 17 text

18 Relationships: Nested objects PUT blog/author/1 { "name": "John Doe", "bio": "...", "blog_posts": [ { "title": "...", "body": "..." }, { "title": "...", "body": "..." }, { "title": "...", "body": "..." } ] } POST blog/author/_search { "query": { "match": { "name": "John" } } } QUERY

Slide 18

Slide 18 text

19 Relationships: Parent-child documents PUT blog/author/1 { "name": "John Doe", "bio": "..." } POST blog/post/_search { "query": { "has_parent": { "type": "author", "query": { "match": { "name": "John" } } } QUERY PUT blog { "mappings": { "author": {}, "post": { "_parent": { "type": "author" } } } } PUT blog/post/1?parent=1 { "title": "...", "body": "..." } PUT blog/post/2?parent=1 { "title": "...", "body": "..." } PUT blog/post/3?parent=1 { "title": "...", "body": "..." }

Slide 19

Slide 19 text

20 Architecture https://www.flickr.com/photos/samhames/4422128094 https://www.flickr.com/photos/haribote/4871284379/

Slide 20

Slide 20 text

21 RDBMS Triggers database by Creative Stall from the Noun Project 1 2

Slide 21

Slide 21 text

22 Async replication to Elasticsearch 1 2 3 ESSynchronizer flow by Yamini Ahluwalia from the Noun Project

Slide 22

Slide 22 text

23 Async replication to Elasticsearch with Logstash 1 2 3

Slide 23

Slide 23 text

24 Forked writes from application 1 2

Slide 24

Slide 24 text

25 Forked writes from application (more robust) 1 2 queue by Huu Nguyen from the Noun Project ESSynchronizer 3 4

Slide 25

Slide 25 text

26 Forked writes from application (more robust with Logstash) 1 2 3 4

Slide 26

Slide 26 text

27 Questions? @shaunak https://www.flickr.com/photos/nicknormal/2245559230/