Slide 1

Slide 1 text

Elastic @costinl Elasticsearch SQL Costin Leau

Slide 2

Slide 2 text

2 We are opening X-Pack Shay Banon

Slide 3

Slide 3 text

3 Elasticsearch SQL will be opened Part of X-Pack Features available based on the license Development and issue tracking in the open Plan for first public release: X-Pack 6.3

Slide 4

Slide 4 text

4 4 What is / isn’t Elasticsearch SQL?

Slide 5

Slide 5 text

5 What is Elasticsearch SQL ? SQL interface to Elasticsearch, currently: • read-only, single mapping (multiple indices allowed if compatible) • mix of ANSI SQL (subset) and Elasticsearch (50%/50%) Adhere to ES experince • Lightweight and fast • Easy to pick-up • No external dependencies/moving parts • Elasticsearch underneath • Part of X-Pack

Slide 6

Slide 6 text

6 What Elasticsearch SQL is NOT database complete ANSI SQL implementation (see above) process external / outside Elasticsearch replacement to data modeling

Slide 7

Slide 7 text

7 Tap into Elasticsearch through SQL Use a declarative language Perceive Elasticsearch as a tabular source

Slide 8

Slide 8 text

8 8 Features

Slide 9

Slide 9 text

9 Queries SELECT * FROM emp SELECT last_name AS l FROM emp SELECT last_name, first_name FROM emp ORDER BY emp_no SELECT … FROM emp WHERE emp_no < 100 AND salary > 50000 SELECT FLOOR(salary/12) s FROM emp WHERE tenure > 10 SELECT last_name l FROM emp WHERE YEAR(birth_date) > 1960 SELECT first_name f FROM emp WHERE QUERY(‘one neo`) SELECT last_name l FROM emp WHERE MATCH(first_name, ‘Thomas’) SELECT first_name FROM emp WHERE MATCH(‘first_name, last_name’, ‘Anderson’) ORDER BY SCORE()

Slide 10

Slide 10 text

10 Queries - aggregation SELECT gender g FROM emp GROUP BY g SELECT gender g, COUNT(*) c, AVG(salary) FROM emp GROUP BY g SELECT tenure t, MAX(salary) FROM emp GROUP BY t HAVING AVG(salary) BETWEEN 30000 AND 50000 SELECT MAX(salary), MIN(salary) FROM emp WHERE emp_no > 300 GROUP BY tenure HAVING MAX(salary) - MIN(salary) > 5000

Slide 11

Slide 11 text

11 Elasticsearch JOINs emp birth_date emp_no first_name gender last_name salary tenure dep dept_name dept_no from_date to_date 1..n Nested docs SELECT dep.dept_name, last_name FROM emp WHERE emp_no = 10020 SELECT last_name FROM emp WHERE dep.dept_name = 'Production'

Slide 12

Slide 12 text

12 12 Architecture

Slide 13

Slide 13 text

13 Behind the scenes Analysis Planning Execution Parsing SQL QUERY Unresolved AST Resolved/Logical Plan Optimized Plan Physical Plan Client Results

Slide 14

Slide 14 text

14 SQL Parser SELECT * FROM emp WHERE emp_no > 10010

Slide 15

Slide 15 text

15 • Validates the query • Resolves tables, columns, functions, aliases, namespaces, etc… Analyzer

Slide 16

Slide 16 text

16 Optimizer SELECT last_name FROM emp WHERE gender = 'M' AND 1000 < emp_no AND NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010

Slide 17

Slide 17 text

17 Query Planner Converts the logical plan into a physical plan SELECT last_name FROM emp WHERE emp_no > 10010 {"query" : { "range" : { "emp_no" : { "from" : 10010, "to" : null }}}, "_source" : { "includes" :["last_name"] } }

Slide 18

Slide 18 text

18 Runs the actual query Streams results Keep track of references Handles tabular format (aggs as a tree vs as a table) Data type conversion Query Executor

Slide 19

Slide 19 text

19 19 Clients

Slide 20

Slide 20 text

20 Clients CLI POST /_sql Kibana / Canvas JDBC ODBC

Slide 21

Slide 21 text

21 Transport: HTTP & REST All clients used HTTP / REST as a unified protocol No extra ports required Easy proxying or tunneling Currently JSON based, CBOR appealing for *DBC clients

Slide 22

Slide 22 text

22 REST – xpack/sql Your usual REST endpoint But accepting SQL queries

Slide 23

Slide 23 text

23 REST - xpack/sql Your usual REST endpoint But accepting SQL queries

Slide 24

Slide 24 text

24 REST – xpack/sql/translate See the resulting query DSL {"query" : { "range" : { "emp_no" : { "from" : 10010, "to" : null }}}, "_source" : { "includes" :["last_name"] } } SELECT last_name FROM emp WHERE emp_no > 10010

Slide 25

Slide 25 text

25 CLI Useful for prototyping and sanity checks Portable (Java based) Protocol based on REST

Slide 26

Slide 26 text

26 JDBC Dedicated client (driver) and server component JDBC 4.2/Java 8 Supports java.sql and javax.sql APIs Pays attention to details ‒ Timeouts (connect vs read vs network) ‒ Logging WIP- Light, without dependencies (safe for embedding)

Slide 27

Slide 27 text

27 ODBC Written in C Target release: X-Pack 7.0.0 Standards • driver: 3.80, Unicode ‒ SPI support: planned • applications: 2.x, 3.x Platforms • Windows 7+: x86 and x64 • iODBC (*nix): planned Work in progress

Slide 28

Slide 28 text

28 28 Roadmap

Slide 29

Slide 29 text

29 Enhanced GROUP BY Currently GROUP BY is limited Move away from terms to composite agg Handle multiple GROUP BYs properly • Better scalability • Expected semantics Likely affect ORDER BY features

Slide 30

Slide 30 text

30 Expand Date-Time support Introduce INTERVAL syntax Expand time functions • Date math • Offset/Absolute/Relative

Slide 31

Slide 31 text

31 Introduce Histograms SELECT AVG(salary) FROM emp GROUP BY HISTOGRAM(hired_date, `1m`) SELECT AVG(salary) FROM emp GROUP BY HISTOGRAM(age, 10)

Slide 32

Slide 32 text

32 Parent-Child support / Flattening emp birth_date emp_no first_name gender last_name salary tenure dep dept_name dept_no from_date to_date 1..n Parent / Child SELECT dep.dept_name, last_name FROM emp WHERE emp_no = 10020 SELECT last_name FROM emp WHERE dep.dept_name = 'Production'

Slide 33

Slide 33 text

33 Cross-Cluster SQL SELECT * FROM remote:index Based on Cross-Cluster Search More work needed in getting remote metadata

Slide 34

Slide 34 text

34 Geo support Implement OGS Simple Features 1.2.1* • Intersects, Disjoint, Within, Contains, Envelope, Distance • asText, asBinary, dimensions – supported by ES internally • support reasonable set out of the 150 functions • Well-Known-Format – Well-Known-Text is possible, Well-Known-Binary wip SELECT Disjoint(centerlines, boundary) FROM divided_routes WHERE divided_routes.name = 'Route 75'

Slide 35

Slide 35 text

35 Geo support – beyond OGC Expose ES specific geo filtering and aggs SELECT GEO_BOUNDS(*) FROM … WHERE name=”musee” SELECT GEO_CENTROID() FROM … GROUP BY city SELECT * FROM … WHERE GEO_BOUNDS(top left, bottom_right) GROUP BY GEO_GRID(location)

Slide 36

Slide 36 text

36 Subselects Restricted to same index Special case of self, equijoin Quite popular with BI tools SELECT last_name l WHERE tenure > 10 FROM (SELECT * FROM emp WHERE salary > 50000)

Slide 37

Slide 37 text

37 Clients items - general Content-Type: text/csv, text/tsv Support custom types (GEO) Extend Time-zone support HTTP/2

Slide 38

Slide 38 text

38 Clients items - JDBC Lightweight - use stand-alone XContent (being pulled out of Elasticearch) Think about JVM/JDK versions • Upgrades are tempting • Users tend to be conservative

Slide 39

Slide 39 text

39 39 User feedback

Slide 40

Slide 40 text

40 40 Q & A

Slide 41

Slide 41 text

www.elastic.co

Slide 42

Slide 42 text

42 42 APPENDIX

Slide 43

Slide 43 text

43 Session concept Expand configuration-per-query Looking into session (client id) • config around queries GET/SET/RESET

Slide 44

Slide 44 text

44 Engine internal items Pattern matching (wip in Java 10, 11) Migration to java.time from Joda (aligned with Elasticsearch)