Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How we built SQL driver for in-memory data grid

How we built SQL driver for in-memory data grid

Oleksii Diagiliev

December 07, 2017
Tweet

More Decks by Oleksii Diagiliev

Other Decks in Programming

Transcript

  1. How we built SQL driver for
    in-memory data grid
    Oleksiy Dyagilev
    Chief Software Engineer, Epam

    View full-size slide

  2. About me
    Contribute to GigaSpaces projects:
    • InsightEdge: Data Analytics Platform powered by Spark http://github.com/InsightEdge
    • XAP: in-memory data grid http://github.com/xap
    Blogging at http://dyagilev.org/

    View full-size slide

  3. About GigaSpaces XAP in 5 mins

    View full-size slide

  4. Scale-out
    In-Memory Storage
    Scale by sharding or replication
    Low latency and High
    throughput
    In-memory storage, fast
    indexing, co-located ops, batch
    ops
    API and query language
    SQL-like against POJOs and
    schema-free documents. JDBC
    driver, lots of limitations
    ACID transactions
    Maintain full ACID compliance
    against your data set through
    full transaction semantics
    High availability and
    resiliency
    Fault tolerance through
    replication, cross-data center
    replication, auto-healing
    Data Tiering
    RAM and SSD

    View full-size slide

  5. What customers want: plug-and-play BI

    View full-size slide

  6. BI connector architecture
    BI tools ODBC JDBC XAP
    Typically supported
    ODBC/JDBC bridge software (EasySoft, OpenLink)
    The missing part

    View full-size slide

  7. We need an ODBC/JDBC driver. ANSI
    compliant!
    • What are the options?
    • Enhance existing driver
    • custom SQL parser
    • extended SQL with ability to query embedded objects, geolocation
    • Create a new one
    • leverage industry-standard SQL parser, optimizer, JDBC driver
    • reimplement custom operations (though BI don’t support it)
    ”Modern relational query optimizers are complex pieces of code
    and typically represent 40 to 50 developer-years of effort”
    Raghu Ramakrishnan and Johannes Gehrke.
    Database Management Systems. Mc Graw Hill, 2000

    View full-size slide

  8. Stages of query execution
    • Parse
    • parse SQL query
    • Validate
    • validate against metadata
    • Optimize
    • logical plan is optimized and converted into physical expression
    • Execute
    • physical plan is converted into application-specific execution
    SQL
    Relational
    algebra
    Runnable
    query
    plan

    View full-size slide

  9. Apache Calcite
    https://www.slideshare.net/julianhyde/apache-calcite-overview
    • Top level Apache project since Oct, 2015
    • Query planning framework
    • Relational algebra, rewrite rules, cost model,
    extensible
    • Packaging
    • Library (JDBC server optional)
    • Adoption
    • Embedded: Lingual (SQL interface for Cascading),
    Apache Drill, Apache Hive, Apache Kylin
    • Adapters: Splunk, JDBC, CSV, Phoenix, etc

    View full-size slide

  10. Relational Algebra
    SELECT d.name, COUNT(*) as c
    FROM Employee e
    JOIN Department d ON e.depId = d.id
    WHERE e.age < 30
    GROUP BY d.id
    HAVING count(*) > 5
    ORDER BY c DESC
    Scan [Emp] Scan [Dept]
    Join [e.depId = d.id]
    Filter [e.age < 30]
    Aggregate [id, count(*) as c]
    Filter [c > 5]
    Project [name, c]
    Sort [c DESC]

    View full-size slide

  11. Calcite query optimizer
    • Volcano planner (Goetz Graefe, 1991) implementation
    • Query optimizations are modeled as pluggable rules
    • Planner rules operate by looking for patterns in the query parse tree

    View full-size slide

  12. Example: FilterIntoJoinRule optimization
    SELECT d.name, COUNT(*) as c
    FROM Employee e
    JOIN Department d ON e.depId = d.id
    WHERE e.age < 30
    GROUP BY d.id
    HAVING count(*) > 5
    ORDER BY c DESC
    Scan [Emp] Scan [Dept]
    Join [e.depId = d.id]
    Filter [e.age < 30]
    Aggregate [id, count(*) as c]
    Filter [c > 5]
    Project [name, c]
    Sort [c DESC]
    Scan [Emp] Scan [Dept]
    Join [e.depId = d.id]
    Filter [e.age < 30]
    Aggregate [id, count(*) as c]
    Filter [c > 5]
    Project [name, c]
    Sort [c DESC]
    FilterIntoJoinRule

    View full-size slide

  13. The key idea: adapter
    • Push as much work as possible to the grid side (predicates, projects,
    joins, sorting)
    • Leave to Calcite implementation everything we cannot handle on the
    grid
    Custom
    optimization rules
    XAP grid (server side)
    Calcite (client side)
    Logical plan Physical plan

    View full-size slide

  14. Example
    SELECT e.id, e.name, e.departmentId
    FROM Employee e
    JOIN Department d ON e.departmentId = d.id
    WHERE d.id = 2
    ORDER BY e.name DESC
    EnumerableSort(sort0=[$1], dir0=[DESC])
    EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1])
    XapToEnumerableConverter
    XapJoin(condition=[=($1, $0)], joinType=[inner])
    XapProject(id=[$0])
    XapFilter(condition=[=(CAST($0):INTEGER, 2)])
    XapTableScan(table=[[sql-test-space, Department]])
    XapProject(departmentId=[$0], id=[$2], name=[$5])
    XapTableScan(table=[[sql-test-space, Employee]])
    XAP grid (server side)
    Calcite (client side)
    XAP grid
    Calcite

    View full-size slide

  15. XapFilterRule
    • Demo in IDE

    View full-size slide

  16. Query processing
    • Flat query (project, filter, sort)
    • Aggregated query (aggregate)
    • Join query

    View full-size slide

  17. Flat/aggregated query execution
    Client
    Partition 1
    Partition 2
    Partition 3
    native XAP API
    XAP grid
    • XAP SQLQuery API to execute flat query
    • XAP Aggregators API for aggregated query

    View full-size slide

  18. JOIN traits
    • Collocated / non-collocated.
    • Join tables are partitioned(routed) on the condition column
    • Equality conjunction (EquiJoin)
    • Join condition only consists of equality conjunctions, e.g. t1.c1 = t2.c2 AND
    t3.c2=t2.c4 AND ...

    View full-size slide

  19. Collocated JOIN
    Client
    Partition 1
    distributed task
    • Join tables are partitioned(routed) on the condition column
    • All the data is available on a partition.
    • No data movement.
    • Distributed Task (kind-of MapReduce)
    • Join in mapper
    • Union record in reducer
    • Scales well
    Table A
    Table B
    Partition 2
    Table A
    Table B
    Partition 3
    Table A
    Table B

    View full-size slide

  20. Non-collocated JOIN
    • A portion of ‘Probe’ table is located on every partition
    • The whole ‘Build’ table is loaded on every partition(the missing data is moved)
    Client
    Partition 1
    distributed task
    Table A
    Table B
    Partition 2
    Table A
    Table B
    Partition 3
    Table A
    Table B
    ‘Build’ table movement

    View full-size slide

  21. JOIN Algorithms
    • Classic JOIN algorithms:
    • Hash Join (primary one, only applicable with Equality Conjunction)
    • Nested loop (fallback)
    • Merge Sort Join (currently not used)

    View full-size slide

  22. Hash Join
    • Smaller relation is used as a ’build’, the other one as a ‘probe’
    • The build should fit into the partition memory. Otherwise we get
    OOM
    • We scan ‘probe’ in place in a streaming fashion to allocate minimum
    memory and avoid GC

    View full-size slide

  23. Testing
    • Unit tests
    • Run embedded grid instance
    • Check explain plan
    • Check query results
    • Check what XAP native query was pushed-down using Mockito Spy
    • TPC-DS benchmark

    View full-size slide

  24. Unit tests
    @Test
    public void testOrderJoinProjectFilter() {
    CalciteAssert.that()
    .with(connectionFactory)
    .query("SELECT e.id, e.name, e.departmentId " +
    "FROM Employee e " +
    "JOIN Department d ON e.departmentId = d.id " +
    "WHERE d.id = 2 " +
    "ORDER BY e.name DESC")
    .returnsOrdered(
    "id=5; name=Smith; departmentId=2",
    "id=4; name=Ford; departmentId=2",
    "id=3; name=Clark; departmentId=2",
    "id=6; name=Adams; departmentId=2"
    )
    .explainContains("PLAN=EnumerableSort(sort0=[$1], dir0=[DESC])\n" +
    " EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1])\n" +
    " XapToEnumerableConverter\n" +
    " XapJoin(condition=[=($1, $0)], joinType=[inner])\n" +
    " XapProject(id=[$0])\n" +
    " XapFilter(condition=[=(CAST($0):INTEGER, 2)])\n" +
    " XapTableScan(table=[[sql-test-space, Department]])\n" +
    " XapProject(departmentId=[$0], id=[$2], name=[$5])\n" +
    " XapTableScan(table=[[sql-test-space, Employee]])");
    }

    View full-size slide

  25. TPC-DS benchmark
    • TPC-DS is the de-facto industry standard benchmark for measuring
    the performance of decision support solutions including, but not
    limited to, Big Data systems.
    • Can run 40% of queries
    • Revealed a couple of bugs
    • Revealed bugs in Calcite
    • Some queries take too much time in the planner

    View full-size slide

  26. Future potential work
    • Statistics (cost model)
    • Enables JOIN ordering optimization
    • Multi-join push down
    • Move Calcite runtime to the grid side
    • Use SQL struct type to support embedded objects

    View full-size slide

  27. JOIN ordering
    TPC-DS Q3:
    SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ss_ext_sales_price) sum_agg
    FROM date_dim dt, store_sales, item
    WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND
    store_sales.ss_item_sk = item.i_item_sk AND
    item.i_manufact_id = 436 AND
    dt.d_moy = 12
    GROUP BY dt.d_year , item.i_brand , item.i_brand_id
    ORDER BY dt.d_year , sum_agg DESC, brand_id
    LIMIT 10
    fact table: store_sales (82B)
    two dimension table: date_dim (73K) and item (462K)

    View full-size slide

  28. Cardinality
    Pictures taken from https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/

    View full-size slide

  29. Two JOIN orderings
    Pictures taken from https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/

    View full-size slide

  30. Cardinality Estimation. Intuition behind
    Histograms.

    View full-size slide

  31. Selection query
    For logical AND expression, its filter selectivity is
    fs(a AND b) = fs(a) * fs (b)
    For logical-OR expression, its filter selectivity
    fs (a OR b) = fs (a) + fs (b) - fs (a AND b) = fs (a) + fs (b) – (fs (a) * fs (b))
    For logical-NOT expression, its filter selectivity is
    fs (NOT a) = 1.0 - fs (a)
    f – number of tuples in the bucket
    dv – number of distinct values in the bucket

    View full-size slide

  32. Join queries

    View full-size slide