Slide 1

Slide 1 text

Costin Leau Elasticsearch SQL Lead @costinl Elasticsearch SQL

Slide 2

Slide 2 text

Agenda What is / is not Elasticsearch SQL ? What’s in a query? Clients 1 2 3

Slide 3

Slide 3 text

What is / is not Elasticsearch SQL

Slide 4

Slide 4 text

What is Elasticsearch SQL SQL interface to Elasticsearch, currently: • read-only, single index/type • mix of ANSI SQL (subset) and Elasticsearch (50%/50%) • in-development Same out-of-the-box experience: • Lightweight and fast • Easy to pick-up • No external dependencies/moving parts • Elasticsearch underneath • Part of X-Pack

Slide 5

Slide 5 text

What Elasticsearch SQL is NOT database complete ANSI SQL implementation (see above) process external / outside Elasticsearch replacement to data modeling

Slide 6

Slide 6 text

DEMO

Slide 7

Slide 7 text

The life of a query

Slide 8

Slide 8 text

A SQL interpreter anatomy Analysis Planning Execution Parsing SQL QUERY Unresolved AST Resolved/Logical Plan Optimized Plan Physical Plan Client Results

Slide 9

Slide 9 text

SQL landscape at a glance RDBMS Cloud solutions Dedicated SQL storage Generic, in-memory compute/SQL

Slide 10

Slide 10 text

Stop fighting the backend layer Analyzed vs Not-analyzed Analyzers Scoring Abstractions get in the way

Slide 11

Slide 11 text

Declaration vs execution order 1. SELECT 2. FROM 3. WHERE 4. GROUP BY 5. HAVING 6. ORDER BY 2. FROM 3. WHERE 4. GROUP BY 5. HAVING 1. SELECT 6. ORDER BY

Slide 12

Slide 12 text

Parsing SELECT * FROM emp.emp WHERE emp_no > 10010

Slide 13

Slide 13 text

Validates query Resolves tables, columns, functions, aliases, namespaces, etc… Analyzer

Slide 14

Slide 14 text

Analysis example: resolving hidden references SELECT last_name FROM emp.emp ORDER BY emp_no

Slide 15

Slide 15 text

Optimizer Changes the plan without affecting its output Logical optimizations • Operator simplifications • Operator pruning • Execution simplification • Execution elimination

Slide 16

Slide 16 text

Optimizer: Operator pruning & simplification SELECT last_name FROM emp.emp WHERE gender = 'M' AND 1000 < emp_no AND NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010

Slide 17

Slide 17 text

Optimizer example: Execution simplification SELECT salary, MIN(salary), MAX(salary), AVG(salary) ... "aggregation" : { .. "stats" : { field : "salary" } }

Slide 18

Slide 18 text

Query Planner / Executor "aggregation" : { "10" : { "max" : { field : "salary“}, "12" : { "min" : { field : “tenure" } } max | min -------------+------------ 84013 |28 94918 |23 ...

Slide 19

Slide 19 text

19 Elasticsearch is a distributed, RESTful search and analytics engine elastic.co/products/elasticsearch

Slide 20

Slide 20 text

Analyzed Full-text search Default for strings (text) Actual value depends on the analyzer By default has a not-analyzed field (keyword) Analyzed vs not-analyzed Not-analyzed Aggregations and exact matching Default for non-strings Required for sorting and aggregations

Slide 21

Slide 21 text

"match" : { "gender" : { "query" : "M" } } Analyzed vs not-analyzed - Queries SELECT … WHERE gender = 'M' "term" : { "gender" : { "value" : "M" } }

Slide 22

Slide 22 text

"_source" : { "includes" : [ "first_name" ] } Analyzed vs not-analyzed - Data "docvalue_fields" : [ "emp_no", "birth_date", "salary“, "tenure" ] SELECT * FROM ...

Slide 23

Slide 23 text

"sort" : [{ "gender.keyword" : { "order" : "asc" } }] Analyzed vs not-analyzed - Value SELECT … ORDER BY gender "sort" : [{ "gender" : { "order" : "asc" } }]

Slide 24

Slide 24 text

Full-Text Search "match" : { "first_name" : { "query" : "Erez", "operator" : "OR“ } } SELECT … WHERE MATCH(first_name, ‘Erez’)

Slide 25

Slide 25 text

Full-Text Search SELECT … WHERE MATCH(‘first_name, last_name^2’, ‘Erez’) "multi_match" : { "query" : "Erez", "fields" : [ "first_name^1.0", "last_name^2.0"] }

Slide 26

Slide 26 text

Full-Text Search SELECT … WHERE QUERY(‘Alain fox’) "query_string" : { "query" : “Alain fox" }

Slide 27

Slide 27 text

Aggregate Ex: Avg, Count, Max Map to aggregations Functions Scalar Ex: Date Extraction, Math Map to scripting and/or custom processing

Slide 28

Slide 28 text

Group By & Having SELECT column_name, aggregate_function(column_name) FROM ... WHERE ... GROUP BY column_name HAVING aggregate_function(column_name) operator value;

Slide 29

Slide 29 text

SELECT column_name, aggregate_function(column_name) FROM ... WHERE ... GROUP BY column_name HAVING aggregate_function(column_name) operator value; Group By & Having Bucket Agg Pipeline Agg Metric Agg

Slide 30

Slide 30 text

Supported through Nested and Parent-Child Joins employees birth_date emp_no first_name gender last_name salary tenure department dept_name dept_no from_date to_date 1..n Nested docs SELECT dep.dept_name, last_name FROM emp.emp WHERE emp_no = 10020 SELECT last_name FROM emp.emp WHERE dep.dept_name = 'Production'

Slide 31

Slide 31 text

Clients

Slide 32

Slide 32 text

OS independent Quick diagnostics and sanity checks Admin focused Optimized for efficiency CLI

Slide 33

Slide 33 text

REST Language independent Elasticsearch specific

Slide 34

Slide 34 text

Dedicated client (driver) and server component JDBC 4.2/Java 8 (downgrade possible) Supports java.sql and javax.sql APIs Pays attention to details – Timeouts (connect vs read vs network) – Logging Light, without dependencies (safe for embedding) JDBC

Slide 35

Slide 35 text

Leverage existing rich eco-system JDBC

Slide 36

Slide 36 text

DEMO

Slide 37

Slide 37 text

Q & A Thank you!