Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How we built SQL driver for in-memory data grid

How we built SQL driver for in-memory data grid

Oleksii Diagiliev

December 07, 2017
Tweet

More Decks by Oleksii Diagiliev

Other Decks in Programming

Transcript

  1. How we built SQL driver for
    in-memory data grid
    Oleksiy Dyagilev
    Chief Software Engineer, Epam

    View Slide

  2. About me
    Contribute to GigaSpaces projects:
    • InsightEdge: Data Analytics Platform powered by Spark http://github.com/InsightEdge
    • XAP: in-memory data grid http://github.com/xap
    Blogging at http://dyagilev.org/

    View Slide

  3. About GigaSpaces XAP in 5 mins

    View Slide

  4. Scale-out
    In-Memory Storage
    Scale by sharding or replication
    Low latency and High
    throughput
    In-memory storage, fast
    indexing, co-located ops, batch
    ops
    API and query language
    SQL-like against POJOs and
    schema-free documents. JDBC
    driver, lots of limitations
    ACID transactions
    Maintain full ACID compliance
    against your data set through
    full transaction semantics
    High availability and
    resiliency
    Fault tolerance through
    replication, cross-data center
    replication, auto-healing
    Data Tiering
    RAM and SSD

    View Slide

  5. What customers want: plug-and-play BI

    View Slide

  6. BI connector architecture
    BI tools ODBC JDBC XAP
    Typically supported
    ODBC/JDBC bridge software (EasySoft, OpenLink)
    The missing part

    View Slide

  7. We need an ODBC/JDBC driver. ANSI
    compliant!
    • What are the options?
    • Enhance existing driver
    • custom SQL parser
    • extended SQL with ability to query embedded objects, geolocation
    • Create a new one
    • leverage industry-standard SQL parser, optimizer, JDBC driver
    • reimplement custom operations (though BI don’t support it)
    ”Modern relational query optimizers are complex pieces of code
    and typically represent 40 to 50 developer-years of effort”
    Raghu Ramakrishnan and Johannes Gehrke.
    Database Management Systems. Mc Graw Hill, 2000

    View Slide

  8. Stages of query execution
    • Parse
    • parse SQL query
    • Validate
    • validate against metadata
    • Optimize
    • logical plan is optimized and converted into physical expression
    • Execute
    • physical plan is converted into application-specific execution
    SQL
    Relational
    algebra
    Runnable
    query
    plan

    View Slide

  9. Apache Calcite
    https://www.slideshare.net/julianhyde/apache-calcite-overview
    • Top level Apache project since Oct, 2015
    • Query planning framework
    • Relational algebra, rewrite rules, cost model,
    extensible
    • Packaging
    • Library (JDBC server optional)
    • Adoption
    • Embedded: Lingual (SQL interface for Cascading),
    Apache Drill, Apache Hive, Apache Kylin
    • Adapters: Splunk, JDBC, CSV, Phoenix, etc

    View Slide

  10. Relational Algebra
    SELECT d.name, COUNT(*) as c
    FROM Employee e
    JOIN Department d ON e.depId = d.id
    WHERE e.age < 30
    GROUP BY d.id
    HAVING count(*) > 5
    ORDER BY c DESC
    Scan [Emp] Scan [Dept]
    Join [e.depId = d.id]
    Filter [e.age < 30]
    Aggregate [id, count(*) as c]
    Filter [c > 5]
    Project [name, c]
    Sort [c DESC]

    View Slide

  11. Calcite query optimizer
    • Volcano planner (Goetz Graefe, 1991) implementation
    • Query optimizations are modeled as pluggable rules
    • Planner rules operate by looking for patterns in the query parse tree

    View Slide

  12. Example: FilterIntoJoinRule optimization
    SELECT d.name, COUNT(*) as c
    FROM Employee e
    JOIN Department d ON e.depId = d.id
    WHERE e.age < 30
    GROUP BY d.id
    HAVING count(*) > 5
    ORDER BY c DESC
    Scan [Emp] Scan [Dept]
    Join [e.depId = d.id]
    Filter [e.age < 30]
    Aggregate [id, count(*) as c]
    Filter [c > 5]
    Project [name, c]
    Sort [c DESC]
    Scan [Emp] Scan [Dept]
    Join [e.depId = d.id]
    Filter [e.age < 30]
    Aggregate [id, count(*) as c]
    Filter [c > 5]
    Project [name, c]
    Sort [c DESC]
    FilterIntoJoinRule

    View Slide

  13. The key idea: adapter
    • Push as much work as possible to the grid side (predicates, projects,
    joins, sorting)
    • Leave to Calcite implementation everything we cannot handle on the
    grid
    Custom
    optimization rules
    XAP grid (server side)
    Calcite (client side)
    Logical plan Physical plan

    View Slide

  14. Example
    SELECT e.id, e.name, e.departmentId
    FROM Employee e
    JOIN Department d ON e.departmentId = d.id
    WHERE d.id = 2
    ORDER BY e.name DESC
    EnumerableSort(sort0=[$1], dir0=[DESC])
    EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1])
    XapToEnumerableConverter
    XapJoin(condition=[=($1, $0)], joinType=[inner])
    XapProject(id=[$0])
    XapFilter(condition=[=(CAST($0):INTEGER, 2)])
    XapTableScan(table=[[sql-test-space, Department]])
    XapProject(departmentId=[$0], id=[$2], name=[$5])
    XapTableScan(table=[[sql-test-space, Employee]])
    XAP grid (server side)
    Calcite (client side)
    XAP grid
    Calcite

    View Slide

  15. XapFilterRule
    • Demo in IDE

    View Slide

  16. Query processing
    • Flat query (project, filter, sort)
    • Aggregated query (aggregate)
    • Join query

    View Slide

  17. Flat/aggregated query execution
    Client
    Partition 1
    Partition 2
    Partition 3
    native XAP API
    XAP grid
    • XAP SQLQuery API to execute flat query
    • XAP Aggregators API for aggregated query

    View Slide

  18. JOIN traits
    • Collocated / non-collocated.
    • Join tables are partitioned(routed) on the condition column
    • Equality conjunction (EquiJoin)
    • Join condition only consists of equality conjunctions, e.g. t1.c1 = t2.c2 AND
    t3.c2=t2.c4 AND ...

    View Slide

  19. Collocated JOIN
    Client
    Partition 1
    distributed task
    • Join tables are partitioned(routed) on the condition column
    • All the data is available on a partition.
    • No data movement.
    • Distributed Task (kind-of MapReduce)
    • Join in mapper
    • Union record in reducer
    • Scales well
    Table A
    Table B
    Partition 2
    Table A
    Table B
    Partition 3
    Table A
    Table B

    View Slide

  20. Non-collocated JOIN
    • A portion of ‘Probe’ table is located on every partition
    • The whole ‘Build’ table is loaded on every partition(the missing data is moved)
    Client
    Partition 1
    distributed task
    Table A
    Table B
    Partition 2
    Table A
    Table B
    Partition 3
    Table A
    Table B
    ‘Build’ table movement

    View Slide

  21. JOIN Algorithms
    • Classic JOIN algorithms:
    • Hash Join (primary one, only applicable with Equality Conjunction)
    • Nested loop (fallback)
    • Merge Sort Join (currently not used)

    View Slide

  22. Hash Join
    • Smaller relation is used as a ’build’, the other one as a ‘probe’
    • The build should fit into the partition memory. Otherwise we get
    OOM
    • We scan ‘probe’ in place in a streaming fashion to allocate minimum
    memory and avoid GC

    View Slide

  23. Testing
    • Unit tests
    • Run embedded grid instance
    • Check explain plan
    • Check query results
    • Check what XAP native query was pushed-down using Mockito Spy
    • TPC-DS benchmark

    View Slide

  24. Unit tests
    @Test
    public void testOrderJoinProjectFilter() {
    CalciteAssert.that()
    .with(connectionFactory)
    .query("SELECT e.id, e.name, e.departmentId " +
    "FROM Employee e " +
    "JOIN Department d ON e.departmentId = d.id " +
    "WHERE d.id = 2 " +
    "ORDER BY e.name DESC")
    .returnsOrdered(
    "id=5; name=Smith; departmentId=2",
    "id=4; name=Ford; departmentId=2",
    "id=3; name=Clark; departmentId=2",
    "id=6; name=Adams; departmentId=2"
    )
    .explainContains("PLAN=EnumerableSort(sort0=[$1], dir0=[DESC])\n" +
    " EnumerableCalc(expr#0..3=[{inputs}], id=[$t2], name=[$t3], departmentId=[$t1])\n" +
    " XapToEnumerableConverter\n" +
    " XapJoin(condition=[=($1, $0)], joinType=[inner])\n" +
    " XapProject(id=[$0])\n" +
    " XapFilter(condition=[=(CAST($0):INTEGER, 2)])\n" +
    " XapTableScan(table=[[sql-test-space, Department]])\n" +
    " XapProject(departmentId=[$0], id=[$2], name=[$5])\n" +
    " XapTableScan(table=[[sql-test-space, Employee]])");
    }

    View Slide

  25. TPC-DS benchmark
    • TPC-DS is the de-facto industry standard benchmark for measuring
    the performance of decision support solutions including, but not
    limited to, Big Data systems.
    • Can run 40% of queries
    • Revealed a couple of bugs
    • Revealed bugs in Calcite
    • Some queries take too much time in the planner

    View Slide

  26. Future potential work
    • Statistics (cost model)
    • Enables JOIN ordering optimization
    • Multi-join push down
    • Move Calcite runtime to the grid side
    • Use SQL struct type to support embedded objects

    View Slide

  27. JOIN ordering
    TPC-DS Q3:
    SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ss_ext_sales_price) sum_agg
    FROM date_dim dt, store_sales, item
    WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND
    store_sales.ss_item_sk = item.i_item_sk AND
    item.i_manufact_id = 436 AND
    dt.d_moy = 12
    GROUP BY dt.d_year , item.i_brand , item.i_brand_id
    ORDER BY dt.d_year , sum_agg DESC, brand_id
    LIMIT 10
    fact table: store_sales (82B)
    two dimension table: date_dim (73K) and item (462K)

    View Slide

  28. Cardinality
    Pictures taken from https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/

    View Slide

  29. Two JOIN orderings
    Pictures taken from https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/

    View Slide

  30. Cardinality Estimation. Intuition behind
    Histograms.

    View Slide

  31. Selection query
    For logical AND expression, its filter selectivity is
    fs(a AND b) = fs(a) * fs (b)
    For logical-OR expression, its filter selectivity
    fs (a OR b) = fs (a) + fs (b) - fs (a AND b) = fs (a) + fs (b) – (fs (a) * fs (b))
    For logical-NOT expression, its filter selectivity is
    fs (NOT a) = 1.0 - fs (a)
    f – number of tuples in the bucket
    dv – number of distinct values in the bucket

    View Slide

  32. Join queries

    View Slide

  33. Questions

    View Slide