Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch for SQL users - CodeMash 2016

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
January 08, 2016

Elasticsearch for SQL users - CodeMash 2016

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

January 08, 2016
Tweet

More Decks by Elastic Co

Other Decks in Programming

Transcript

  1. 1 Shaunak Kashyap Developer Advocate at Elastic Elasticsearch for SQL

    users
  2. 2 The Elastic Stack Elasticsearch Store, Index & Analyze Kibana

    User Interface Security Monitoring Alerting Plugins Logstash Beats Ingest Found: Elasticsearch as a Service Hosted Service
  3. 3 Agenda Search queries Data modeling Architecture 1 2 3

  4. 2 4 Agenda Search queries Data modeling Architecture 1 3

  5. 5 Agenda Search queries Data modeling 1 2 3 Architecture

  6. 6 Search Queries https://www.flickr.com/photos/samhames/4422128094

  7. 7 CREATE TABLE IF NOT EXISTS emails ( sender VARCHAR(255)

    NOT NULL, recipients TEXT, cc TEXT, bcc TEXT, subject VARCHAR(1024), body MEDIUMTEXT, datetime DATETIME ); CREATE INDEX emails_sender ON emails(sender); CREATE FULLTEXT INDEX emails_subject ON emails(subject); CREATE FULLTEXT INDEX emails_body ON emails(body); curl -XPOST 'http://localhost:9200/enron' -d' { "mappings": { "email": { "properties": { "sender": { "type": "string", "index": "not_analyzed" }, "recipients": { "type": "string", "index": "not_analyzed" }, "cc": { "type": "string", "index": "not_analyzed" }, "bcc": { "type": "string", "index": "not_analyzed" }, "subject": { "type": "string", "analyzer": "english" }, "body": { "type": "string", "analyzer": "english" } } } } Schemas
  8. 8 Loading the data

  9. 9 [LIVE DEMO] • Search for text in a single

    field • Search for text in multiple fields • Search for a phrase https://github.com/ycombinator/es-enron
  10. 10 Other Search Features Stemming Synonyms Did you mean? •

    Jump, jumped, jumping • Queen, monarch • Monetery => Monetary
  11. 11 Data Modeling https://www.flickr.com/photos/samhames/4422128094 https://www.flickr.com/photos/ericparker/7854157310

  12. 12 To analyze or not to analyze? PUT cities/city/1 {

    "city": "Sandusky", "population": 25340 } PUT cities/city/2 { "city": "New Albany", "population": 8829 } PUT cities/city/3 { "city": "New York", "population": 8406000 } POST cities/_search { "query": { "match": { "city": "New Albany" } } } QUERY + = ?
  13. 13 To analyze or not to analyze? PUT cities/city/1 {

    "city": "Sandusky", "population": 25340 } PUT cities/city/2 { "city": "New Albany", "population": 8829 } PUT cities/city/3 { "city": "New York", "population": 8406000 } Term Document IDs Albany 2 New 2,3 Sandusky 1 York 3
  14. 14 To analyze or not to analyze? PUT cities {

    "mappings": { "city": { "properties": { "city": { "type": "string", "index": "not_analyzed" } } } } } MAPPING Term Document IDs New Albany 2 New York 3 Sandusky 1
  15. PUT blog/post/1 { "author_id": 1, "title": "...", "body": "..." }

    PUT blog/post/2 { "author_id": 1, "title": "...", "body": "..." } PUT blog/post/3 { "author_id": 1, "title": "...", "body": "..." } 15 Relationships: Application-side joins PUT blog/author/1 { "name": "John Doe", "bio": "..." } POST blog/author/_search { "query": { "match": { "name": "John" } } } QUERY 1 POST blog/post/_search { "query": { "match": { "author_id": <each id from query 1 result> } } } QUERY 2
  16. PUT blog/post/1 { "author_name": "John Doe", "title": "...", "body": "..."

    } PUT blog/post/2 { "author_name": "John Doe", "title": "...", "body": "..." } 16 Relationships: Data denormalization POST blog/post/_search { "query": { "match": { "author_name": "John" } } } QUERY PUT blog/post/3 { "author_name": "John Doe", "title": "...", "body": "..." }
  17. 17 Relationships: Nested objects PUT blog/author/1 { "name": "John Doe",

    "bio": "...", "blog_posts": [ { "title": "...", "body": "..." }, { "title": "...", "body": "..." }, { "title": "...", "body": "..." } ] } POST blog/author/_search { "query": { "match": { "name": "John" } } } QUERY
  18. 18 Relationships: Parent-child documents PUT blog/author/1 { "name": "John Doe",

    "bio": "..." } POST blog/post/_search { "query": { "has_parent": { "type": "author", "query": { "match": { "name": "John" } } } QUERY PUT blog { "mappings": { "author": {}, "post": { "_parent": { "type": "author" } } } } PUT blog/post/1?parent=1 { "title": "...", "body": "..." } PUT blog/post/2?parent=1 { "title": "...", "body": "..." } PUT blog/post/3?parent=1 { "title": "...", "body": "..." }
  19. 19 Architecture https://www.flickr.com/photos/samhames/4422128094 https://www.flickr.com/photos/haribote/4871284379/

  20. 20 RDBMS Triggers database by Creative Stall from the Noun

    Project 1 2
  21. 21 Async replication to Elasticsearch 1 2 3 ESSynchronizer flow

    by Yamini Ahluwalia from the Noun Project
  22. 22 Async replication to Elasticsearch with Logstash 1 2 3

  23. 23 Forked writes from application 1 2

  24. 24 Forked writes from application (more robust) 1 2 queue

    by Huu Nguyen from the Noun Project ESSynchronizer 3 4
  25. 25 Forked writes from application (more robust with Logstash) 1

    2 3 4
  26. 26 Questions? @shaunak https://www.flickr.com/photos/nicknormal/2245559230/