Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch SQL

Elasticsearch SQL

Do you find yourself pointing new Elasticsearch users every which way to various reference guides, language documentations, and example console snippets? Are you onboarding new Elasticsearch users to your company/organization and trying to get them up to speed? Come see what Costin's been cookin'!

In this session, you'll get a taste of a new Elastic {Re}search project and some of the recipes behind it. Costin will discuss how Elastic is working to make Elasticsearch more accessible to users, applications, and systems. Join us and bring your own ideas and an open mind.

Costin Leau l Software Engineer l Elastic

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 08, 2017
Tweet

Transcript

  1. Elastic A Standard Query Language for Elasticsearch Costin Leau costin

    / @costinl
  2. Ess-que-el or Sequel?

  3. SQL Semantics SQL Engine for ES Closer look at Elasticsearch

    SQL 3 How does it work?
  4. ANSI SQL

  5. SQL characteristics Declarative Performs data definition, manipulation and control Fourth-generation

    programming language (4GL)
  6. SQL characteristics Declarative Performs data definition, manipulation and control SQL:1999

    is Turing complete Can do fractals Fourth-generation programming language (4GL) https://wiki.postgresql.org/wiki/Mandelbrot_set
  7. 7 SELECT * FROM emp.emp SELECT last_name, first_date FROM emp.emp

    WHERE tenure > 20 What to do, not how to do it Declarative Language
  8. 1. SELECT 2. FROM 3. WHERE 4. GROUP BY 5.

    HAVING 6. ORDER BY Declaration vs execution order 8 2. FROM 3. WHERE 4. GROUP BY 5. HAVING 1. SELECT 6. ORDER BY
  9. Data/Tabular format SQL uses table references (not columns) Data is

    ‘pipelined’ to the user • consumers move back and forth • each row has a backing catalog/table (source) • data might be updated
  10. Elasticsearch SQL

  11. Support Elasticsearch features as much as possible Support standard SQL

    as much as possible Same experience out of the box as Elasticsearch • Lightweight & fast • Easy to pick up • Still Elasticsearch Whiteboard Goal – Elasticsearch SQL
  12. SQL landscape at a glance RDBMS Cloud solutions Dedicated SQL

    storage Generic, in-memory compute/SQL
  13. A SQL interpreter anatomy Analysis Planning Execution Parsing SQL QUERY

    Unresolved AST Resolved/Logical Plan Optimized Plan Physical Plan Client Results
  14. 14 Reusable Customized Highly-specific • Parsing • Domain model •

    Optimizer • Parsing • Domain model (plans) • Optimizer • Analysis • Physical plan • Execution Using an existing SQL foundation
  15. Analyzed vs Not-analyzed Analyzers Scoring Abstractions got in the way

    Stop fighting the data layer
  16. SQL Engine

  17. Read-only SQL Targets single index/type Features: 50% SQL, 50% Elasticsearch

    Runs inside Elasticsearch Zero runtime footprint Current iteration
  18. Schema employees birth_date emp_no first_name gender last_name salary tenure department

    dept_name dept_no from_date to_date 1..n employees birth_date emp_no first_name gender last_name salary tenure department dept_name dept_no from_date to_date 1..n Tables Nested docs
  19. SQL Parser SELECT * FROM emp.emp WHERE emp_no > 10010

  20. • Validates the query • Resolves tables, columns, functions, aliases,

    namespaces, etc… Analyzer
  21. Analysis – aliasing SELECT e.last_name l FROM emp.emp e WHERE

    emp_no > 10010
  22. Analysis – resolving hidden references SELECT last_name FROM emp.emp ORDER

    BY emp_no
  23. Changes the plan without affecting its output Logical optimizations •

    Operator simplifications • Operator pruning • Execution simplification • Execution elimination Optimizer
  24. Optimizer SELECT last_name FROM emp.emp WHERE gender = 'M' AND

    1000 < emp_no AND NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010
  25. Query Planner Converts the logical plan into a physical plan

    SELECT last_name FROM emp.emp WHERE emp_no > 10010 {"query" : { "range" : { "emp_no" : { "from" : 10010, "to" : null }}}, "_source" : { "includes" :["last_name"] } }
  26. Runs the actual query Streams results Keep track of references

    Handles tabular format (aggs as a tree vs as a table) Data type conversion Query Executor
  27. elastic.co/products/elasticsearch { } Elasticsearch is a distributed, RESTful search and

    analytics engine
  28. Analyzed Full-text search Default for strings (text) Actual value depends

    on the analyzer By default has a not-analyzed field (keyword) Analyzed vs not-analyzed Not-analyzed Aggregations and exact matching Default for non-strings Required for sorting and aggregations
  29. "match" : { "gender" : { "query" : "M" }

    } Analyzed vs not-analyzed - Example SELECT … WHERE gender = 'M' "term" : { "gender" : { "value" : "M" } }
  30. "_source" : { "includes" : [ "birth_date" ] } Analyzed

    vs not-analyzed - Example "docvalue_fields" : [ "emp_no", "first_name", "gender", "last_name", "salary“, "tenure" ]
  31. "sort" : [{ "gender.keyword" : { "order" : "asc" }

    }] Sorting SELECT … ORDER BY gender
  32. Full-Text Search "match" : { "first_name" : { "query" :

    "Erez", "operator" : "OR“ } } SELECT … WHERE MATCH(first_name, ‘Erez’)
  33. Full-Text Search SELECT … WHERE MATCH(‘first_name, last_name^2’, ‘Erez’) "multi_match" :

    { "query" : "Erez", "fields" : [ "first_name^1.0", "last_name^2.0"] }
  34. Full-Text Search SELECT … WHERE QUERY(‘Baek fox’) "query_string" : {

    "query" : "Baek fox" }
  35. Aggregate Ex: Avg, Count, Max Map onto metrics aggregations Functions

    Scalar Ex: Date extraction Painlessly through scripting Monotonic functions are well optimized
  36. Group By & Having SELECT column_name, aggregate_function(column_name) FROM ... WHERE

    ... GROUP BY column_name HAVING aggregate_function(column_name) operator value;
  37. SELECT column_name, aggregate_function(column_name) FROM ... WHERE ... GROUP BY column_name

    HAVING aggregate_function(column_name) operator value; Group By & Having Bucket Agg Pipeline Agg Metric Agg
  38. Supported through Nested and Parent-Child Joins employees birth_date emp_no first_name

    gender last_name salary tenure department dept_name dept_no from_date to_date 1..n Nested docs SELECT dep.dept_name, last_name FROM emp.emp WHERE emp_no = 10020 SELECT last_name FROM emp.emp WHERE dep.dept_name = 'Production'
  39. Clients

  40. OS independent Quick diagnostics and sanity checks Admin focused Optimized

    for efficiency CLI
  41. REST Language independent Elasticsearch specific

  42. Dedicated client (driver) and server component JDBC 4.2/Java 8 (downgrade

    possible) Supports java.sql and javax.sql APIs Pays attention to details • Timeouts (connect vs read vs network) • Logging Light, without dependencies JDBC
  43. Leverage the existing rich eco-system JDBC

  44. 44 More Questions? Visit us at the AMA

  45. www.elastic.co

  46. Demo: Elasticsearch SQL

  47. Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/

    Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 47 Please attribute Elastic with a link to elastic.co