Slide 1

Slide 1 text

Elastic A Standard Query Language for Elasticsearch Costin Leau costin / @costinl

Slide 2

Slide 2 text

Ess-que-el or Sequel?

Slide 3

Slide 3 text

SQL Semantics SQL Engine for ES Closer look at Elasticsearch SQL 3 How does it work?

Slide 4

Slide 4 text

ANSI SQL

Slide 5

Slide 5 text

SQL characteristics Declarative Performs data definition, manipulation and control Fourth-generation programming language (4GL)

Slide 6

Slide 6 text

SQL characteristics Declarative Performs data definition, manipulation and control SQL:1999 is Turing complete Can do fractals Fourth-generation programming language (4GL) https://wiki.postgresql.org/wiki/Mandelbrot_set

Slide 7

Slide 7 text

7 SELECT * FROM emp.emp SELECT last_name, first_date FROM emp.emp WHERE tenure > 20 What to do, not how to do it Declarative Language

Slide 8

Slide 8 text

1. SELECT 2. FROM 3. WHERE 4. GROUP BY 5. HAVING 6. ORDER BY Declaration vs execution order 8 2. FROM 3. WHERE 4. GROUP BY 5. HAVING 1. SELECT 6. ORDER BY

Slide 9

Slide 9 text

Data/Tabular format SQL uses table references (not columns) Data is ‘pipelined’ to the user • consumers move back and forth • each row has a backing catalog/table (source) • data might be updated

Slide 10

Slide 10 text

Elasticsearch SQL

Slide 11

Slide 11 text

Support Elasticsearch features as much as possible Support standard SQL as much as possible Same experience out of the box as Elasticsearch • Lightweight & fast • Easy to pick up • Still Elasticsearch Whiteboard Goal – Elasticsearch SQL

Slide 12

Slide 12 text

SQL landscape at a glance RDBMS Cloud solutions Dedicated SQL storage Generic, in-memory compute/SQL

Slide 13

Slide 13 text

A SQL interpreter anatomy Analysis Planning Execution Parsing SQL QUERY Unresolved AST Resolved/Logical Plan Optimized Plan Physical Plan Client Results

Slide 14

Slide 14 text

14 Reusable Customized Highly-specific • Parsing • Domain model • Optimizer • Parsing • Domain model (plans) • Optimizer • Analysis • Physical plan • Execution Using an existing SQL foundation

Slide 15

Slide 15 text

Analyzed vs Not-analyzed Analyzers Scoring Abstractions got in the way Stop fighting the data layer

Slide 16

Slide 16 text

SQL Engine

Slide 17

Slide 17 text

Read-only SQL Targets single index/type Features: 50% SQL, 50% Elasticsearch Runs inside Elasticsearch Zero runtime footprint Current iteration

Slide 18

Slide 18 text

Schema employees birth_date emp_no first_name gender last_name salary tenure department dept_name dept_no from_date to_date 1..n employees birth_date emp_no first_name gender last_name salary tenure department dept_name dept_no from_date to_date 1..n Tables Nested docs

Slide 19

Slide 19 text

SQL Parser SELECT * FROM emp.emp WHERE emp_no > 10010

Slide 20

Slide 20 text

• Validates the query • Resolves tables, columns, functions, aliases, namespaces, etc… Analyzer

Slide 21

Slide 21 text

Analysis – aliasing SELECT e.last_name l FROM emp.emp e WHERE emp_no > 10010

Slide 22

Slide 22 text

Analysis – resolving hidden references SELECT last_name FROM emp.emp ORDER BY emp_no

Slide 23

Slide 23 text

Changes the plan without affecting its output Logical optimizations • Operator simplifications • Operator pruning • Execution simplification • Execution elimination Optimizer

Slide 24

Slide 24 text

Optimizer SELECT last_name FROM emp.emp WHERE gender = 'M' AND 1000 < emp_no AND NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010

Slide 25

Slide 25 text

Query Planner Converts the logical plan into a physical plan SELECT last_name FROM emp.emp WHERE emp_no > 10010 {"query" : { "range" : { "emp_no" : { "from" : 10010, "to" : null }}}, "_source" : { "includes" :["last_name"] } }

Slide 26

Slide 26 text

Runs the actual query Streams results Keep track of references Handles tabular format (aggs as a tree vs as a table) Data type conversion Query Executor

Slide 27

Slide 27 text

elastic.co/products/elasticsearch { } Elasticsearch is a distributed, RESTful search and analytics engine

Slide 28

Slide 28 text

Analyzed Full-text search Default for strings (text) Actual value depends on the analyzer By default has a not-analyzed field (keyword) Analyzed vs not-analyzed Not-analyzed Aggregations and exact matching Default for non-strings Required for sorting and aggregations

Slide 29

Slide 29 text

"match" : { "gender" : { "query" : "M" } } Analyzed vs not-analyzed - Example SELECT … WHERE gender = 'M' "term" : { "gender" : { "value" : "M" } }

Slide 30

Slide 30 text

"_source" : { "includes" : [ "birth_date" ] } Analyzed vs not-analyzed - Example "docvalue_fields" : [ "emp_no", "first_name", "gender", "last_name", "salary“, "tenure" ]

Slide 31

Slide 31 text

"sort" : [{ "gender.keyword" : { "order" : "asc" } }] Sorting SELECT … ORDER BY gender

Slide 32

Slide 32 text

Full-Text Search "match" : { "first_name" : { "query" : "Erez", "operator" : "OR“ } } SELECT … WHERE MATCH(first_name, ‘Erez’)

Slide 33

Slide 33 text

Full-Text Search SELECT … WHERE MATCH(‘first_name, last_name^2’, ‘Erez’) "multi_match" : { "query" : "Erez", "fields" : [ "first_name^1.0", "last_name^2.0"] }

Slide 34

Slide 34 text

Full-Text Search SELECT … WHERE QUERY(‘Baek fox’) "query_string" : { "query" : "Baek fox" }

Slide 35

Slide 35 text

Aggregate Ex: Avg, Count, Max Map onto metrics aggregations Functions Scalar Ex: Date extraction Painlessly through scripting Monotonic functions are well optimized

Slide 36

Slide 36 text

Group By & Having SELECT column_name, aggregate_function(column_name) FROM ... WHERE ... GROUP BY column_name HAVING aggregate_function(column_name) operator value;

Slide 37

Slide 37 text

SELECT column_name, aggregate_function(column_name) FROM ... WHERE ... GROUP BY column_name HAVING aggregate_function(column_name) operator value; Group By & Having Bucket Agg Pipeline Agg Metric Agg

Slide 38

Slide 38 text

Supported through Nested and Parent-Child Joins employees birth_date emp_no first_name gender last_name salary tenure department dept_name dept_no from_date to_date 1..n Nested docs SELECT dep.dept_name, last_name FROM emp.emp WHERE emp_no = 10020 SELECT last_name FROM emp.emp WHERE dep.dept_name = 'Production'

Slide 39

Slide 39 text

Clients

Slide 40

Slide 40 text

OS independent Quick diagnostics and sanity checks Admin focused Optimized for efficiency CLI

Slide 41

Slide 41 text

REST Language independent Elasticsearch specific

Slide 42

Slide 42 text

Dedicated client (driver) and server component JDBC 4.2/Java 8 (downgrade possible) Supports java.sql and javax.sql APIs Pays attention to details • Timeouts (connect vs read vs network) • Logging Light, without dependencies JDBC

Slide 43

Slide 43 text

Leverage the existing rich eco-system JDBC

Slide 44

Slide 44 text

44 More Questions? Visit us at the AMA

Slide 45

Slide 45 text

www.elastic.co

Slide 46

Slide 46 text

Demo: Elasticsearch SQL

Slide 47

Slide 47 text

Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 47 Please attribute Elastic with a link to elastic.co