How we built SQL driver for in-memory data grid

How we built SQL driver for in-memory data grid Oleksiy
Dyagilev Chief Software Engineer, Epam

About me Contribute to GigaSpaces projects: • InsightEdge: Data Analytics
Platform powered by Spark http://github.com/InsightEdge • XAP: in-memory data grid http://github.com/xap Blogging at http://dyagilev.org/

About GigaSpaces XAP in 5 mins

Scale-out In-Memory Storage Scale by sharding or replication Low latency
and High throughput In-memory storage, fast indexing, co-located ops, batch ops API and query language SQL-like against POJOs and schema-free documents. JDBC driver, lots of limitations ACID transactions Maintain full ACID compliance against your data set through full transaction semantics High availability and resiliency Fault tolerance through replication, cross-data center replication, auto-healing Data Tiering RAM and SSD

What customers want: plug-and-play BI

BI connector architecture BI tools ODBC JDBC XAP Typically supported
ODBC/JDBC bridge software (EasySoft, OpenLink) The missing part

We need an ODBC/JDBC driver. ANSI compliant! • What are
the options? • Enhance existing driver • custom SQL parser • extended SQL with ability to query embedded objects, geolocation • Create a new one • leverage industry-standard SQL parser, optimizer, JDBC driver • reimplement custom operations (though BI don’t support it) ”Modern relational query optimizers are complex pieces of code and typically represent 40 to 50 developer-years of effort” Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. Mc Graw Hill, 2000

Stages of query execution • Parse • parse SQL query
• Validate • validate against metadata • Optimize • logical plan is optimized and converted into physical expression • Execute • physical plan is converted into application-specific execution SQL Relational algebra Runnable query plan

Apache Calcite https://www.slideshare.net/julianhyde/apache-calcite-overview • Top level Apache project since Oct,
2015 • Query planning framework • Relational algebra, rewrite rules, cost model, extensible • Packaging • Library (JDBC server optional) • Adoption • Embedded: Lingual (SQL interface for Cascading), Apache Drill, Apache Hive, Apache Kylin • Adapters: Splunk, JDBC, CSV, Phoenix, etc

Relational Algebra SELECT d.name, COUNT(*) as c FROM Employee e
JOIN Department d ON e.depId = d.id WHERE e.age < 30 GROUP BY d.id HAVING count(*) > 5 ORDER BY c DESC Scan [Emp] Scan [Dept] Join [e.depId = d.id] Filter [e.age < 30] Aggregate [id, count(*) as c] Filter [c > 5] Project [name, c] Sort [c DESC]

Calcite query optimizer • Volcano planner (Goetz Graefe, 1991) implementation
• Query optimizations are modeled as pluggable rules • Planner rules operate by looking for patterns in the query parse tree

Example: FilterIntoJoinRule optimization SELECT d.name, COUNT(*) as c FROM Employee
e JOIN Department d ON e.depId = d.id WHERE e.age < 30 GROUP BY d.id HAVING count(*) > 5 ORDER BY c DESC Scan [Emp] Scan [Dept] Join [e.depId = d.id] Filter [e.age < 30] Aggregate [id, count(*) as c] Filter [c > 5] Project [name, c] Sort [c DESC] Scan [Emp] Scan [Dept] Join [e.depId = d.id] Filter [e.age < 30] Aggregate [id, count(*) as c] Filter [c > 5] Project [name, c] Sort [c DESC] FilterIntoJoinRule

The key idea: adapter • Push as much work as
possible to the grid side (predicates, projects, joins, sorting) • Leave to Calcite implementation everything we cannot handle on the grid Custom optimization rules XAP grid (server side) Calcite (client side) Logical plan Physical plan

Example SELECT e.id, e.name, e.departmentId FROM Employee e JOIN Department
d ON e.departmentId = d.id WHERE d.id = 2 ORDER BY e.name DESC EnumerableSort(sort0=[$1], dir0=[DESC]) EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1]) XapToEnumerableConverter XapJoin(condition=[=($1, $0)], joinType=[inner]) XapProject(id=[$0]) XapFilter(condition=[=(CAST($0):INTEGER, 2)]) XapTableScan(table=[[sql-test-space, Department]]) XapProject(departmentId=[$0], id=[$2], name=[$5]) XapTableScan(table=[[sql-test-space, Employee]]) XAP grid (server side) Calcite (client side) XAP grid Calcite

XapFilterRule • Demo in IDE

Query processing • Flat query (project, filter, sort) • Aggregated
query (aggregate) • Join query

Flat/aggregated query execution Client Partition 1 Partition 2 Partition 3
native XAP API XAP grid • XAP SQLQuery API to execute flat query • XAP Aggregators API for aggregated query

JOIN traits • Collocated / non-collocated. • Join tables are
partitioned(routed) on the condition column • Equality conjunction (EquiJoin) • Join condition only consists of equality conjunctions, e.g. t1.c1 = t2.c2 AND t3.c2=t2.c4 AND ...

Collocated JOIN Client Partition 1 distributed task • Join tables
are partitioned(routed) on the condition column • All the data is available on a partition. • No data movement. • Distributed Task (kind-of MapReduce) • Join in mapper • Union record in reducer • Scales well Table A Table B Partition 2 Table A Table B Partition 3 Table A Table B

Non-collocated JOIN • A portion of ‘Probe’ table is located
on every partition • The whole ‘Build’ table is loaded on every partition(the missing data is moved) Client Partition 1 distributed task Table A Table B Partition 2 Table A Table B Partition 3 Table A Table B ‘Build’ table movement

JOIN Algorithms • Classic JOIN algorithms: • Hash Join (primary
one, only applicable with Equality Conjunction) • Nested loop (fallback) • Merge Sort Join (currently not used)

Hash Join • Smaller relation is used as a ’build’,
the other one as a ‘probe’ • The build should fit into the partition memory. Otherwise we get OOM • We scan ‘probe’ in place in a streaming fashion to allocate minimum memory and avoid GC

Testing • Unit tests • Run embedded grid instance •
Check explain plan • Check query results • Check what XAP native query was pushed-down using Mockito Spy • TPC-DS benchmark

Unit tests @Test public void testOrderJoinProjectFilter() { CalciteAssert.that() .with(connectionFactory) .query("SELECT
e.id, e.name, e.departmentId " + "FROM Employee e " + "JOIN Department d ON e.departmentId = d.id " + "WHERE d.id = 2 " + "ORDER BY e.name DESC") .returnsOrdered( "id=5; name=Smith; departmentId=2", "id=4; name=Ford; departmentId=2", "id=3; name=Clark; departmentId=2", "id=6; name=Adams; departmentId=2" ) .explainContains("PLAN=EnumerableSort(sort0=[$1], dir0=[DESC])\n" + " EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1])\n" + " XapToEnumerableConverter\n" + " XapJoin(condition=[=($1, $0)], joinType=[inner])\n" + " XapProject(id=[$0])\n" + " XapFilter(condition=[=(CAST($0):INTEGER, 2)])\n" + " XapTableScan(table=[[sql-test-space, Department]])\n" + " XapProject(departmentId=[$0], id=[$2], name=[$5])\n" + " XapTableScan(table=[[sql-test-space, Employee]])"); }

TPC-DS benchmark • TPC-DS is the de-facto industry standard benchmark
for measuring the performance of decision support solutions including, but not limited to, Big Data systems. • Can run 40% of queries • Revealed a couple of bugs • Revealed bugs in Calcite • Some queries take too much time in the planner

Future potential work • Statistics (cost model) • Enables JOIN
ordering optimization • Multi-join push down • Move Calcite runtime to the grid side • Use SQL struct type to support embedded objects

JOIN ordering TPC-DS Q3: SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand,
SUM(ss_ext_sales_price) sum_agg FROM date_dim dt, store_sales, item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND store_sales.ss_item_sk = item.i_item_sk AND item.i_manufact_id = 436 AND dt.d_moy = 12 GROUP BY dt.d_year , item.i_brand , item.i_brand_id ORDER BY dt.d_year , sum_agg DESC, brand_id LIMIT 10 fact table: store_sales (82B) two dimension table: date_dim (73K) and item (462K)

Cardinality Pictures taken from https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/

Two JOIN orderings Pictures taken from https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/

Cardinality Estimation. Intuition behind Histograms.

Selection query For logical AND expression, its filter selectivity is
fs(a AND b) = fs(a) * fs (b) For logical-OR expression, its filter selectivity fs (a OR b) = fs (a) + fs (b) - fs (a AND b) = fs (a) + fs (b) – (fs (a) * fs (b)) For logical-NOT expression, its filter selectivity is fs (NOT a) = 1.0 - fs (a) f – number of tuples in the bucket dv – number of distinct values in the bucket

Join queries

Questions

How we built SQL driver for in-memory data grid

How we built SQL driver for in-memory data grid

Oleksii Diagiliev

More Decks by Oleksii Diagiliev

Other Decks in Programming

Featured

Transcript