Costin Leau
Elasticsearch SQL Lead
@costinl
Elasticsearch SQL
Slide 2
Slide 2 text
Agenda
What is / is not Elasticsearch SQL ?
What’s in a query?
Clients
1
2
3
Slide 3
Slide 3 text
What is / is not
Elasticsearch SQL
Slide 4
Slide 4 text
What is Elasticsearch SQL
SQL interface to Elasticsearch, currently:
• read-only, single index/type
• mix of ANSI SQL (subset) and Elasticsearch (50%/50%)
• in-development
Same out-of-the-box experience:
• Lightweight and fast
• Easy to pick-up
• No external dependencies/moving parts
• Elasticsearch underneath
• Part of X-Pack
Slide 5
Slide 5 text
What Elasticsearch SQL is NOT
database
complete ANSI SQL implementation (see above)
process external / outside Elasticsearch
replacement to data modeling
Slide 6
Slide 6 text
DEMO
Slide 7
Slide 7 text
The life of a query
Slide 8
Slide 8 text
A SQL interpreter anatomy
Analysis Planning Execution
Parsing
SQL QUERY
Unresolved
AST
Resolved/Logical
Plan
Optimized
Plan
Physical
Plan
Client
Results
Slide 9
Slide 9 text
SQL landscape at a glance
RDBMS
Cloud solutions
Dedicated SQL storage
Generic, in-memory compute/SQL
Slide 10
Slide 10 text
Stop fighting the backend layer
Analyzed vs Not-analyzed
Analyzers
Scoring
Abstractions get in the way
Slide 11
Slide 11 text
Declaration vs execution order
1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
1. SELECT
6. ORDER BY
Slide 12
Slide 12 text
Parsing
SELECT * FROM emp.emp WHERE emp_no > 10010
Analysis example: resolving hidden references
SELECT last_name FROM emp.emp ORDER BY emp_no
Slide 15
Slide 15 text
Optimizer
Changes the plan without affecting its output
Logical optimizations
• Operator simplifications
• Operator pruning
• Execution simplification
• Execution elimination
Slide 16
Slide 16 text
Optimizer: Operator pruning & simplification
SELECT last_name FROM emp.emp WHERE gender = 'M' AND 1000 < emp_no AND
NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010
Query Planner / Executor
"aggregation" : {
"10" : { "max" : {
field : "salary“},
"12" : { "min" : {
field : “tenure"
}
}
max | min
-------------+------------
84013 |28
94918 |23
...
Slide 19
Slide 19 text
19
Elasticsearch is a distributed,
RESTful search and analytics
engine
elastic.co/products/elasticsearch
Slide 20
Slide 20 text
Analyzed
Full-text search
Default for strings (text)
Actual value depends on the analyzer
By default has a not-analyzed field
(keyword)
Analyzed vs not-analyzed
Not-analyzed
Aggregations and exact matching
Default for non-strings
Required for sorting and aggregations
Aggregate
Ex: Avg, Count, Max
Map to aggregations
Functions
Scalar
Ex: Date Extraction, Math
Map to scripting and/or custom
processing
Slide 28
Slide 28 text
Group By & Having
SELECT column_name, aggregate_function(column_name)
FROM ...
WHERE ...
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;
Slide 29
Slide 29 text
SELECT column_name, aggregate_function(column_name)
FROM ...
WHERE ...
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;
Group By & Having
Bucket Agg
Pipeline Agg
Metric Agg
Slide 30
Slide 30 text
Supported through Nested and Parent-Child
Joins
employees
birth_date
emp_no
first_name
gender
last_name
salary
tenure
department
dept_name
dept_no
from_date
to_date
1..n
Nested docs
SELECT dep.dept_name, last_name
FROM emp.emp WHERE emp_no = 10020
SELECT last_name FROM emp.emp
WHERE dep.dept_name = 'Production'
Slide 31
Slide 31 text
Clients
Slide 32
Slide 32 text
OS independent
Quick diagnostics and sanity checks
Admin focused
Optimized for efficiency
CLI
Slide 33
Slide 33 text
REST
Language independent
Elasticsearch specific
Slide 34
Slide 34 text
Dedicated client (driver) and server component
JDBC 4.2/Java 8 (downgrade possible)
Supports java.sql and javax.sql APIs
Pays attention to details
– Timeouts (connect vs read vs network)
– Logging
Light, without dependencies (safe for embedding)
JDBC