How we built SQL driver for in-memory data grid

How we built SQL driver for in-memory data grid


Oleksiy Dyagilev

December 07, 2017


  1. How we built SQL driver for in-memory data grid Oleksiy

    Dyagilev Chief Software Engineer, Epam
  2. About me Contribute to GigaSpaces projects: • InsightEdge: Data Analytics

    Platform powered by Spark • XAP: in-memory data grid Blogging at
  3. About GigaSpaces XAP in 5 mins

  4. Scale-out In-Memory Storage Scale by sharding or replication Low latency

    and High throughput In-memory storage, fast indexing, co-located ops, batch ops API and query language SQL-like against POJOs and schema-free documents. JDBC driver, lots of limitations ACID transactions Maintain full ACID compliance against your data set through full transaction semantics High availability and resiliency Fault tolerance through replication, cross-data center replication, auto-healing Data Tiering RAM and SSD
  5. What customers want: plug-and-play BI

  6. BI connector architecture BI tools ODBC JDBC XAP Typically supported

    ODBC/JDBC bridge software (EasySoft, OpenLink) The missing part
  7. We need an ODBC/JDBC driver. ANSI compliant! • What are

    the options? • Enhance existing driver • custom SQL parser • extended SQL with ability to query embedded objects, geolocation • Create a new one • leverage industry-standard SQL parser, optimizer, JDBC driver • reimplement custom operations (though BI don’t support it) ”Modern relational query optimizers are complex pieces of code and typically represent 40 to 50 developer-years of effort” Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. Mc Graw Hill, 2000
  8. Stages of query execution • Parse • parse SQL query

    • Validate • validate against metadata • Optimize • logical plan is optimized and converted into physical expression • Execute • physical plan is converted into application-specific execution SQL Relational algebra Runnable query plan
  9. Apache Calcite • Top level Apache project since Oct,

    2015 • Query planning framework • Relational algebra, rewrite rules, cost model, extensible • Packaging • Library (JDBC server optional) • Adoption • Embedded: Lingual (SQL interface for Cascading), Apache Drill, Apache Hive, Apache Kylin • Adapters: Splunk, JDBC, CSV, Phoenix, etc
  10. Relational Algebra SELECT, COUNT(*) as c FROM Employee e

    JOIN Department d ON e.depId = WHERE e.age < 30 GROUP BY HAVING count(*) > 5 ORDER BY c DESC Scan [Emp] Scan [Dept] Join [e.depId =] Filter [e.age < 30] Aggregate [id, count(*) as c] Filter [c > 5] Project [name, c] Sort [c DESC]
  11. Calcite query optimizer • Volcano planner (Goetz Graefe, 1991) implementation

    • Query optimizations are modeled as pluggable rules • Planner rules operate by looking for patterns in the query parse tree
  12. Example: FilterIntoJoinRule optimization SELECT, COUNT(*) as c FROM Employee

    e JOIN Department d ON e.depId = WHERE e.age < 30 GROUP BY HAVING count(*) > 5 ORDER BY c DESC Scan [Emp] Scan [Dept] Join [e.depId =] Filter [e.age < 30] Aggregate [id, count(*) as c] Filter [c > 5] Project [name, c] Sort [c DESC] Scan [Emp] Scan [Dept] Join [e.depId =] Filter [e.age < 30] Aggregate [id, count(*) as c] Filter [c > 5] Project [name, c] Sort [c DESC] FilterIntoJoinRule
  13. The key idea: adapter • Push as much work as

    possible to the grid side (predicates, projects, joins, sorting) • Leave to Calcite implementation everything we cannot handle on the grid Custom optimization rules XAP grid (server side) Calcite (client side) Logical plan Physical plan
  14. Example SELECT,, e.departmentId FROM Employee e JOIN Department

    d ON e.departmentId = WHERE = 2 ORDER BY DESC EnumerableSort(sort0=[$1], dir0=[DESC]) EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1]) XapToEnumerableConverter XapJoin(condition=[=($1, $0)], joinType=[inner]) XapProject(id=[$0]) XapFilter(condition=[=(CAST($0):INTEGER, 2)]) XapTableScan(table=[[sql-test-space, Department]]) XapProject(departmentId=[$0], id=[$2], name=[$5]) XapTableScan(table=[[sql-test-space, Employee]]) XAP grid (server side) Calcite (client side) XAP grid Calcite
  15. XapFilterRule • Demo in IDE

  16. Query processing • Flat query (project, filter, sort) • Aggregated

    query (aggregate) • Join query
  17. Flat/aggregated query execution Client Partition 1 Partition 2 Partition 3

    native XAP API XAP grid • XAP SQLQuery API to execute flat query • XAP Aggregators API for aggregated query
  18. JOIN traits • Collocated / non-collocated. • Join tables are

    partitioned(routed) on the condition column • Equality conjunction (EquiJoin) • Join condition only consists of equality conjunctions, e.g. t1.c1 = t2.c2 AND t3.c2=t2.c4 AND ...
  19. Collocated JOIN Client Partition 1 distributed task • Join tables

    are partitioned(routed) on the condition column • All the data is available on a partition. • No data movement. • Distributed Task (kind-of MapReduce) • Join in mapper • Union record in reducer • Scales well Table A Table B Partition 2 Table A Table B Partition 3 Table A Table B
  20. Non-collocated JOIN • A portion of ‘Probe’ table is located

    on every partition • The whole ‘Build’ table is loaded on every partition(the missing data is moved) Client Partition 1 distributed task Table A Table B Partition 2 Table A Table B Partition 3 Table A Table B ‘Build’ table movement
  21. JOIN Algorithms • Classic JOIN algorithms: • Hash Join (primary

    one, only applicable with Equality Conjunction) • Nested loop (fallback) • Merge Sort Join (currently not used)
  22. Hash Join • Smaller relation is used as a ’build’,

    the other one as a ‘probe’ • The build should fit into the partition memory. Otherwise we get OOM • We scan ‘probe’ in place in a streaming fashion to allocate minimum memory and avoid GC
  23. Testing • Unit tests • Run embedded grid instance •

    Check explain plan • Check query results • Check what XAP native query was pushed-down using Mockito Spy • TPC-DS benchmark
  24. Unit tests @Test public void testOrderJoinProjectFilter() { CalciteAssert.that() .with(connectionFactory) .query("SELECT,, e.departmentId " + "FROM Employee e " + "JOIN Department d ON e.departmentId = " + "WHERE = 2 " + "ORDER BY DESC") .returnsOrdered( "id=5; name=Smith; departmentId=2", "id=4; name=Ford; departmentId=2", "id=3; name=Clark; departmentId=2", "id=6; name=Adams; departmentId=2" ) .explainContains("PLAN=EnumerableSort(sort0=[$1], dir0=[DESC])\n" + " EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1])\n" + " XapToEnumerableConverter\n" + " XapJoin(condition=[=($1, $0)], joinType=[inner])\n" + " XapProject(id=[$0])\n" + " XapFilter(condition=[=(CAST($0):INTEGER, 2)])\n" + " XapTableScan(table=[[sql-test-space, Department]])\n" + " XapProject(departmentId=[$0], id=[$2], name=[$5])\n" + " XapTableScan(table=[[sql-test-space, Employee]])"); }
  25. TPC-DS benchmark • TPC-DS is the de-facto industry standard benchmark

    for measuring the performance of decision support solutions including, but not limited to, Big Data systems. • Can run 40% of queries • Revealed a couple of bugs • Revealed bugs in Calcite • Some queries take too much time in the planner
  26. Future potential work • Statistics (cost model) • Enables JOIN

    ordering optimization • Multi-join push down • Move Calcite runtime to the grid side • Use SQL struct type to support embedded objects
  27. JOIN ordering TPC-DS Q3: SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand,

    SUM(ss_ext_sales_price) sum_agg FROM date_dim dt, store_sales, item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND store_sales.ss_item_sk = item.i_item_sk AND item.i_manufact_id = 436 AND dt.d_moy = 12 GROUP BY dt.d_year , item.i_brand , item.i_brand_id ORDER BY dt.d_year , sum_agg DESC, brand_id LIMIT 10 fact table: store_sales (82B) two dimension table: date_dim (73K) and item (462K)
  28. Cardinality Pictures taken from

  29. Two JOIN orderings Pictures taken from

  30. Cardinality Estimation. Intuition behind Histograms.

  31. Selection query For logical AND expression, its filter selectivity is

    fs(a AND b) = fs(a) * fs (b) For logical-OR expression, its filter selectivity fs (a OR b) = fs (a) + fs (b) - fs (a AND b) = fs (a) + fs (b) – (fs (a) * fs (b)) For logical-NOT expression, its filter selectivity is fs (NOT a) = 1.0 - fs (a) f – number of tuples in the bucket dv – number of distinct values in the bucket
  32. Join queries

  33. Questions