Elastic
A Standard Query Language
for Elasticsearch
Costin Leau
costin / @costinl
Slide 2
Slide 2 text
Ess-que-el or Sequel?
Slide 3
Slide 3 text
SQL Semantics
SQL Engine for ES
Closer look at Elasticsearch SQL
3
How does it work?
Slide 4
Slide 4 text
ANSI SQL
Slide 5
Slide 5 text
SQL characteristics
Declarative
Performs data definition, manipulation and control
Fourth-generation programming language (4GL)
Slide 6
Slide 6 text
SQL characteristics
Declarative
Performs data definition, manipulation and control
SQL:1999 is Turing complete
Can do fractals
Fourth-generation programming language (4GL) https://wiki.postgresql.org/wiki/Mandelbrot_set
Slide 7
Slide 7 text
7
SELECT * FROM emp.emp
SELECT last_name, first_date FROM emp.emp
WHERE tenure > 20
What to do, not how to do it
Declarative Language
Slide 8
Slide 8 text
1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY
Declaration vs execution order
8
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
1. SELECT
6. ORDER BY
Slide 9
Slide 9 text
Data/Tabular format
SQL uses table references (not columns)
Data is ‘pipelined’ to the user
• consumers move back and forth
• each row has a backing catalog/table (source)
• data might be updated
Slide 10
Slide 10 text
Elasticsearch SQL
Slide 11
Slide 11 text
Support Elasticsearch features as much as possible
Support standard SQL as much as possible
Same experience out of the box as Elasticsearch
• Lightweight & fast
• Easy to pick up
• Still Elasticsearch
Whiteboard Goal – Elasticsearch SQL
Slide 12
Slide 12 text
SQL landscape at a glance
RDBMS
Cloud solutions
Dedicated SQL storage
Generic, in-memory compute/SQL
Slide 13
Slide 13 text
A SQL interpreter anatomy
Analysis Planning Execution
Parsing
SQL QUERY
Unresolved
AST
Resolved/Logical
Plan
Optimized
Plan
Physical
Plan
Client
Results
Slide 14
Slide 14 text
14
Reusable Customized Highly-specific
• Parsing
• Domain model
• Optimizer
• Parsing
• Domain model (plans)
• Optimizer
• Analysis
• Physical plan
• Execution
Using an existing SQL foundation
Slide 15
Slide 15 text
Analyzed vs Not-analyzed
Analyzers
Scoring
Abstractions got in the way
Stop fighting the data layer
Slide 16
Slide 16 text
SQL Engine
Slide 17
Slide 17 text
Read-only SQL
Targets single index/type
Features: 50% SQL, 50% Elasticsearch
Runs inside Elasticsearch
Zero runtime footprint
Current iteration
Analysis – aliasing
SELECT e.last_name l FROM emp.emp e WHERE emp_no > 10010
Slide 22
Slide 22 text
Analysis – resolving hidden references
SELECT last_name FROM emp.emp ORDER BY emp_no
Slide 23
Slide 23 text
Changes the plan without affecting its output
Logical optimizations
• Operator simplifications
• Operator pruning
• Execution simplification
• Execution elimination
Optimizer
Slide 24
Slide 24 text
Optimizer
SELECT last_name FROM emp.emp WHERE gender = 'M' AND 1000 < emp_no AND NOT
FALSE OR last_name LIKE 'K%' AND emp_no > 10010
Slide 25
Slide 25 text
Query Planner
Converts the logical plan into a physical plan
SELECT last_name FROM emp.emp WHERE emp_no > 10010
{"query" : {
"range" : {
"emp_no" : {
"from" : 10010,
"to" : null
}}},
"_source" : {
"includes" :["last_name"]
}
}
Slide 26
Slide 26 text
Runs the actual query
Streams results
Keep track of references
Handles tabular format (aggs as a tree vs as a table)
Data type conversion
Query Executor
Slide 27
Slide 27 text
elastic.co/products/elasticsearch
{ }
Elasticsearch is a distributed,
RESTful search and analytics
engine
Slide 28
Slide 28 text
Analyzed
Full-text search
Default for strings (text)
Actual value depends on the analyzer
By default has a not-analyzed field
(keyword)
Analyzed vs not-analyzed
Not-analyzed
Aggregations and exact matching
Default for non-strings
Required for sorting and aggregations
Aggregate
Ex: Avg, Count, Max
Map onto metrics aggregations
Functions
Scalar
Ex: Date extraction
Painlessly through scripting
Monotonic functions are well optimized
Slide 36
Slide 36 text
Group By & Having
SELECT column_name, aggregate_function(column_name)
FROM ...
WHERE ...
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;
Slide 37
Slide 37 text
SELECT column_name, aggregate_function(column_name)
FROM ...
WHERE ...
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;
Group By & Having
Bucket Agg
Pipeline Agg
Metric Agg
Slide 38
Slide 38 text
Supported through Nested and Parent-Child
Joins
employees
birth_date
emp_no
first_name
gender
last_name
salary
tenure
department
dept_name
dept_no
from_date
to_date
1..n
Nested docs
SELECT dep.dept_name, last_name
FROM emp.emp WHERE emp_no = 10020
SELECT last_name FROM emp.emp
WHERE dep.dept_name = 'Production'
Slide 39
Slide 39 text
Clients
Slide 40
Slide 40 text
OS independent
Quick diagnostics and sanity checks
Admin focused
Optimized for efficiency
CLI
Slide 41
Slide 41 text
REST
Language independent
Elasticsearch specific
Slide 42
Slide 42 text
Dedicated client (driver) and server component
JDBC 4.2/Java 8 (downgrade possible)
Supports java.sql and javax.sql APIs
Pays attention to details
• Timeouts (connect vs read vs network)
• Logging
Light, without dependencies
JDBC
Slide 43
Slide 43 text
Leverage the existing rich eco-system
JDBC
Slide 44
Slide 44 text
44
More Questions?
Visit us at the AMA
Slide 45
Slide 45 text
www.elastic.co
Slide 46
Slide 46 text
Demo: Elasticsearch SQL
Slide 47
Slide 47 text
Except where otherwise noted, this work is licensed under
http://creativecommons.org/licenses/by-nd/4.0/
Creative Commons and the double C in a circle are
registered trademarks of Creative Commons in the United States and other countries.
Third party marks and brands are the property of their respective holders.
47
Please attribute Elastic with a link to elastic.co