Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch SQL Deep Dive Session

Elasticsearch SQL Deep Dive Session

Elastic{ON} Tour London - June 22, 2017

Get an in-depth walkthrough of how to use Elasticsearch SQL, a SQL-based query DSL that will help you open the door to new data and new Elasticsearch users in your organization.

Costin Leau | Elasticsearch SQL Lead | Elastic

Elastic Co

June 22, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. What is Elasticsearch SQL SQL interface to Elasticsearch, currently: •

    read-only, single index/type • mix of ANSI SQL (subset) and Elasticsearch (50%/50%) • in-development Same out-of-the-box experience: • Lightweight and fast • Easy to pick-up • No external dependencies/moving parts • Elasticsearch underneath • Part of X-Pack
  2. What Elasticsearch SQL is NOT database complete ANSI SQL implementation

    (see above) process external / outside Elasticsearch replacement to data modeling
  3. A SQL interpreter anatomy Analysis Planning Execution Parsing SQL QUERY

    Unresolved AST Resolved/Logical Plan Optimized Plan Physical Plan Client Results
  4. SQL landscape at a glance RDBMS Cloud solutions Dedicated SQL

    storage Generic, in-memory compute/SQL
  5. Declaration vs execution order 1. SELECT 2. FROM 3. WHERE

    4. GROUP BY 5. HAVING 6. ORDER BY 2. FROM 3. WHERE 4. GROUP BY 5. HAVING 1. SELECT 6. ORDER BY
  6. Optimizer Changes the plan without affecting its output Logical optimizations

    • Operator simplifications • Operator pruning • Execution simplification • Execution elimination
  7. Optimizer: Operator pruning & simplification SELECT last_name FROM emp.emp WHERE

    gender = 'M' AND 1000 < emp_no AND NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010
  8. Query Planner / Executor "aggregation" : { "10" : {

    "max" : { field : "salary“}, "12" : { "min" : { field : “tenure" } } max | min -------------+------------ 84013 |28 94918 |23 ...
  9. Analyzed Full-text search Default for strings (text) Actual value depends

    on the analyzer By default has a not-analyzed field (keyword) Analyzed vs not-analyzed Not-analyzed Aggregations and exact matching Default for non-strings Required for sorting and aggregations
  10. "match" : { "gender" : { "query" : "M" }

    } Analyzed vs not-analyzed - Queries SELECT … WHERE gender = 'M' "term" : { "gender" : { "value" : "M" } }
  11. "_source" : { "includes" : [ "first_name" ] } Analyzed

    vs not-analyzed - Data "docvalue_fields" : [ "emp_no", "birth_date", "salary“, "tenure" ] SELECT * FROM ...
  12. "sort" : [{ "gender.keyword" : { "order" : "asc" }

    }] Analyzed vs not-analyzed - Value SELECT … ORDER BY gender "sort" : [{ "gender" : { "order" : "asc" } }]
  13. Full-Text Search "match" : { "first_name" : { "query" :

    "Erez", "operator" : "OR“ } } SELECT … WHERE MATCH(first_name, ‘Erez’)
  14. Full-Text Search SELECT … WHERE MATCH(‘first_name, last_name^2’, ‘Erez’) "multi_match" :

    { "query" : "Erez", "fields" : [ "first_name^1.0", "last_name^2.0"] }
  15. Aggregate Ex: Avg, Count, Max Map to aggregations Functions Scalar

    Ex: Date Extraction, Math Map to scripting and/or custom processing
  16. Group By & Having SELECT column_name, aggregate_function(column_name) FROM ... WHERE

    ... GROUP BY column_name HAVING aggregate_function(column_name) operator value;
  17. SELECT column_name, aggregate_function(column_name) FROM ... WHERE ... GROUP BY column_name

    HAVING aggregate_function(column_name) operator value; Group By & Having Bucket Agg Pipeline Agg Metric Agg
  18. Supported through Nested and Parent-Child Joins employees birth_date emp_no first_name

    gender last_name salary tenure department dept_name dept_no from_date to_date 1..n Nested docs SELECT dep.dept_name, last_name FROM emp.emp WHERE emp_no = 10020 SELECT last_name FROM emp.emp WHERE dep.dept_name = 'Production'
  19. Dedicated client (driver) and server component JDBC 4.2/Java 8 (downgrade

    possible) Supports java.sql and javax.sql APIs Pays attention to details – Timeouts (connect vs read vs network) – Logging Light, without dependencies (safe for embedding) JDBC