Save 37% off PRO during our Black Friday Sale! »

Elasticsearch SQL Deep Dive Session

Elasticsearch SQL Deep Dive Session

Elastic{ON} Tour London - June 22, 2017

Get an in-depth walkthrough of how to use Elasticsearch SQL, a SQL-based query DSL that will help you open the door to new data and new Elasticsearch users in your organization.

Costin Leau | Elasticsearch SQL Lead | Elastic

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

June 22, 2017
Tweet

Transcript

  1. Costin Leau Elasticsearch SQL Lead @costinl Elasticsearch SQL

  2. Agenda What is / is not Elasticsearch SQL ? What’s

    in a query? Clients 1 2 3
  3. What is / is not Elasticsearch SQL

  4. What is Elasticsearch SQL SQL interface to Elasticsearch, currently: •

    read-only, single index/type • mix of ANSI SQL (subset) and Elasticsearch (50%/50%) • in-development Same out-of-the-box experience: • Lightweight and fast • Easy to pick-up • No external dependencies/moving parts • Elasticsearch underneath • Part of X-Pack
  5. What Elasticsearch SQL is NOT database complete ANSI SQL implementation

    (see above) process external / outside Elasticsearch replacement to data modeling
  6. DEMO

  7. The life of a query

  8. A SQL interpreter anatomy Analysis Planning Execution Parsing SQL QUERY

    Unresolved AST Resolved/Logical Plan Optimized Plan Physical Plan Client Results
  9. SQL landscape at a glance RDBMS Cloud solutions Dedicated SQL

    storage Generic, in-memory compute/SQL
  10. Stop fighting the backend layer Analyzed vs Not-analyzed Analyzers Scoring

    Abstractions get in the way
  11. Declaration vs execution order 1. SELECT 2. FROM 3. WHERE

    4. GROUP BY 5. HAVING 6. ORDER BY 2. FROM 3. WHERE 4. GROUP BY 5. HAVING 1. SELECT 6. ORDER BY
  12. Parsing SELECT * FROM emp.emp WHERE emp_no > 10010

  13. Validates query Resolves tables, columns, functions, aliases, namespaces, etc… Analyzer

  14. Analysis example: resolving hidden references SELECT last_name FROM emp.emp ORDER

    BY emp_no
  15. Optimizer Changes the plan without affecting its output Logical optimizations

    • Operator simplifications • Operator pruning • Execution simplification • Execution elimination
  16. Optimizer: Operator pruning & simplification SELECT last_name FROM emp.emp WHERE

    gender = 'M' AND 1000 < emp_no AND NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010
  17. Optimizer example: Execution simplification SELECT salary, MIN(salary), MAX(salary), AVG(salary) ...

    "aggregation" : { .. "stats" : { field : "salary" } }
  18. Query Planner / Executor "aggregation" : { "10" : {

    "max" : { field : "salary“}, "12" : { "min" : { field : “tenure" } } max | min -------------+------------ 84013 |28 94918 |23 ...
  19. 19 Elasticsearch is a distributed, RESTful search and analytics engine

    elastic.co/products/elasticsearch
  20. Analyzed Full-text search Default for strings (text) Actual value depends

    on the analyzer By default has a not-analyzed field (keyword) Analyzed vs not-analyzed Not-analyzed Aggregations and exact matching Default for non-strings Required for sorting and aggregations
  21. "match" : { "gender" : { "query" : "M" }

    } Analyzed vs not-analyzed - Queries SELECT … WHERE gender = 'M' "term" : { "gender" : { "value" : "M" } }
  22. "_source" : { "includes" : [ "first_name" ] } Analyzed

    vs not-analyzed - Data "docvalue_fields" : [ "emp_no", "birth_date", "salary“, "tenure" ] SELECT * FROM ...
  23. "sort" : [{ "gender.keyword" : { "order" : "asc" }

    }] Analyzed vs not-analyzed - Value SELECT … ORDER BY gender "sort" : [{ "gender" : { "order" : "asc" } }]
  24. Full-Text Search "match" : { "first_name" : { "query" :

    "Erez", "operator" : "OR“ } } SELECT … WHERE MATCH(first_name, ‘Erez’)
  25. Full-Text Search SELECT … WHERE MATCH(‘first_name, last_name^2’, ‘Erez’) "multi_match" :

    { "query" : "Erez", "fields" : [ "first_name^1.0", "last_name^2.0"] }
  26. Full-Text Search SELECT … WHERE QUERY(‘Alain fox’) "query_string" : {

    "query" : “Alain fox" }
  27. Aggregate Ex: Avg, Count, Max Map to aggregations Functions Scalar

    Ex: Date Extraction, Math Map to scripting and/or custom processing
  28. Group By & Having SELECT column_name, aggregate_function(column_name) FROM ... WHERE

    ... GROUP BY column_name HAVING aggregate_function(column_name) operator value;
  29. SELECT column_name, aggregate_function(column_name) FROM ... WHERE ... GROUP BY column_name

    HAVING aggregate_function(column_name) operator value; Group By & Having Bucket Agg Pipeline Agg Metric Agg
  30. Supported through Nested and Parent-Child Joins employees birth_date emp_no first_name

    gender last_name salary tenure department dept_name dept_no from_date to_date 1..n Nested docs SELECT dep.dept_name, last_name FROM emp.emp WHERE emp_no = 10020 SELECT last_name FROM emp.emp WHERE dep.dept_name = 'Production'
  31. Clients

  32. OS independent Quick diagnostics and sanity checks Admin focused Optimized

    for efficiency CLI
  33. REST Language independent Elasticsearch specific

  34. Dedicated client (driver) and server component JDBC 4.2/Java 8 (downgrade

    possible) Supports java.sql and javax.sql APIs Pays attention to details – Timeouts (connect vs read vs network) – Logging Light, without dependencies (safe for embedding) JDBC
  35. Leverage existing rich eco-system JDBC

  36. DEMO

  37. Q & A Thank you!