Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - Elasticsearch SQL

Elastic{ON} 2018 - Elasticsearch SQL

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. 3 Elasticsearch SQL will be opened Part of X-Pack Features

    available based on the license Development and issue tracking in the open Plan for first public release: X-Pack 6.3
  2. 5 What is Elasticsearch SQL ? SQL interface to Elasticsearch,

    currently: • read-only, single mapping (multiple indices allowed if compatible) • mix of ANSI SQL (subset) and Elasticsearch (50%/50%) Adhere to ES experince • Lightweight and fast • Easy to pick-up • No external dependencies/moving parts • Elasticsearch underneath • Part of X-Pack
  3. 6 What Elasticsearch SQL is NOT database complete ANSI SQL

    implementation (see above) process external / outside Elasticsearch replacement to data modeling
  4. 7 Tap into Elasticsearch through SQL Use a declarative language

    Perceive Elasticsearch as a tabular source
  5. 9 Queries SELECT * FROM emp SELECT last_name AS l

    FROM emp SELECT last_name, first_name FROM emp ORDER BY emp_no SELECT … FROM emp WHERE emp_no < 100 AND salary > 50000 SELECT FLOOR(salary/12) s FROM emp WHERE tenure > 10 SELECT last_name l FROM emp WHERE YEAR(birth_date) > 1960 SELECT first_name f FROM emp WHERE QUERY(‘one neo`) SELECT last_name l FROM emp WHERE MATCH(first_name, ‘Thomas’) SELECT first_name FROM emp WHERE MATCH(‘first_name, last_name’, ‘Anderson’) ORDER BY SCORE()
  6. 10 Queries - aggregation SELECT gender g FROM emp GROUP

    BY g SELECT gender g, COUNT(*) c, AVG(salary) FROM emp GROUP BY g SELECT tenure t, MAX(salary) FROM emp GROUP BY t HAVING AVG(salary) BETWEEN 30000 AND 50000 SELECT MAX(salary), MIN(salary) FROM emp WHERE emp_no > 300 GROUP BY tenure HAVING MAX(salary) - MIN(salary) > 5000
  7. 11 Elasticsearch JOINs emp birth_date emp_no first_name gender last_name salary

    tenure dep dept_name dept_no from_date to_date 1..n Nested docs SELECT dep.dept_name, last_name FROM emp WHERE emp_no = 10020 SELECT last_name FROM emp WHERE dep.dept_name = 'Production'
  8. 13 Behind the scenes Analysis Planning Execution Parsing SQL QUERY

    Unresolved AST Resolved/Logical Plan Optimized Plan Physical Plan Client Results
  9. 16 Optimizer SELECT last_name FROM emp WHERE gender = 'M'

    AND 1000 < emp_no AND NOT FALSE OR last_name LIKE 'K%' AND emp_no > 10010
  10. 17 Query Planner Converts the logical plan into a physical

    plan SELECT last_name FROM emp WHERE emp_no > 10010 {"query" : { "range" : { "emp_no" : { "from" : 10010, "to" : null }}}, "_source" : { "includes" :["last_name"] } }
  11. 18 Runs the actual query Streams results Keep track of

    references Handles tabular format (aggs as a tree vs as a table) Data type conversion Query Executor
  12. 21 Transport: HTTP & REST All clients used HTTP /

    REST as a unified protocol No extra ports required Easy proxying or tunneling Currently JSON based, CBOR appealing for *DBC clients
  13. 24 REST – xpack/sql/translate See the resulting query DSL {"query"

    : { "range" : { "emp_no" : { "from" : 10010, "to" : null }}}, "_source" : { "includes" :["last_name"] } } SELECT last_name FROM emp WHERE emp_no > 10010
  14. 26 JDBC Dedicated client (driver) and server component JDBC 4.2/Java

    8 Supports java.sql and javax.sql APIs Pays attention to details ‒ Timeouts (connect vs read vs network) ‒ Logging WIP- Light, without dependencies (safe for embedding)
  15. 27 ODBC Written in C Target release: X-Pack 7.0.0 Standards

    • driver: 3.80, Unicode ‒ SPI support: planned • applications: 2.x, 3.x Platforms • Windows 7+: x86 and x64 • iODBC (*nix): planned Work in progress
  16. 29 Enhanced GROUP BY Currently GROUP BY is limited Move

    away from terms to composite agg Handle multiple GROUP BYs properly • Better scalability • Expected semantics Likely affect ORDER BY features
  17. 31 Introduce Histograms SELECT AVG(salary) FROM emp GROUP BY HISTOGRAM(hired_date,

    `1m`) SELECT AVG(salary) FROM emp GROUP BY HISTOGRAM(age, 10)
  18. 32 Parent-Child support / Flattening emp birth_date emp_no first_name gender

    last_name salary tenure dep dept_name dept_no from_date to_date 1..n Parent / Child SELECT dep.dept_name, last_name FROM emp WHERE emp_no = 10020 SELECT last_name FROM emp WHERE dep.dept_name = 'Production'
  19. 33 Cross-Cluster SQL SELECT * FROM remote:index Based on Cross-Cluster

    Search More work needed in getting remote metadata
  20. 34 Geo support Implement OGS Simple Features 1.2.1* • Intersects,

    Disjoint, Within, Contains, Envelope, Distance • asText, asBinary, dimensions – supported by ES internally • support reasonable set out of the 150 functions • Well-Known-Format – Well-Known-Text is possible, Well-Known-Binary wip SELECT Disjoint(centerlines, boundary) FROM divided_routes WHERE divided_routes.name = 'Route 75'
  21. 35 Geo support – beyond OGC Expose ES specific geo

    filtering and aggs SELECT GEO_BOUNDS(*) FROM … WHERE name=”musee” SELECT GEO_CENTROID() FROM … GROUP BY city SELECT * FROM … WHERE GEO_BOUNDS(top left, bottom_right) GROUP BY GEO_GRID(location)
  22. 36 Subselects Restricted to same index Special case of self,

    equijoin Quite popular with BI tools SELECT last_name l WHERE tenure > 10 FROM (SELECT * FROM emp WHERE salary > 50000)
  23. 38 Clients items - JDBC Lightweight - use stand-alone XContent

    (being pulled out of Elasticearch) Think about JVM/JDK versions • Upgrades are tempting • Users tend to be conservative
  24. 44 Engine internal items Pattern matching (wip in Java 10,

    11) Migration to java.time from Joda (aligned with Elasticsearch)