Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite

Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite

https://springoneplatform.io/sessions/enable-sql-jdbc-access-to-apache-geode-gemfire-using-apache-calcite

When working with BigData & IoT systems we often feel the need for an established, Common Query Language.

To fill this gap some NoSql vendors are building SQL access to their systems. Building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. It allows you to integrate SQL parser, Cost-Based Optimizer, and JDBC with your NoSql system. Calcite has been used to empower many BigData platforms such as Hive, Spark, Flink, Drill, HBase/Phoenix to name some.

In this session I will walk you through the process of building a SQL access layer for Apache Geode (GemFire). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of In-Memory-Data-Grid systems like Geode.

Hopefully this will enable you to add SQL capabilities to your preferred NoSQL data system.

B53016292e87bca26da88fc940070c4f?s=128

Christian Tzolov

December 07, 2017
Tweet

Transcript

  1. Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite By

    Christian Tzolov @christzolov 1
  2. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Christian Tzolov Spring Software Engineer at Pivotal Spring Cloud Data Flow Apache Committer, Crunch PMC member Apache {Crunch, HAWQ, Zeppelin, Calcite...} 2 blog.tzolov.net @christzolov @tzolov
  3. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ “What happens if an established NoSQL database decides to implement a reasonably standard SQL; The only predictable outcome for such an eventuality is plenty of argument.” Martin Fowler, P.J.Sadalage, NoSQL Distilled, 2012 3
  4. Disclaimer

  5. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Safe Harbor Statement The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings. These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward looking information in this presentation. 5
  6. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo? In a minute ;) 6
  7. Data Management Systems

  8. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 8
  9. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ One Size Fit All* “The traditional DBMS architecture has been used to support many data-centric applications with widely varying characteristics and requirements… We argue that this concept is no longer applicable to the database market, and that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser.” Michael Stonebraker, 2005 9
  10. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Collection of Independent Databases 10 How to ingest and keep consistent across DBs? How to access the data across multiple DBs?
  11. Data Federation

  12. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Federation: Unified Query Interface 12
  13. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Federation: Two Mapping Tasks 13
  14. Federation: Data Model Mapping

  15. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Geode Model Mapping • PDX serialization • Primitive Types • Arrays (employ[0]) • Nested Objects (parent[‘child.childChild…’]) 15 Cache Region 1 Region K Val Key v1 k1 v2 k2 Schema Table 1 Table K C o l 1 C o l 2 C o l N V(M,1) R o w M V(M,2) V(M,N ) V(2,1) R o w 2 V(2,2) V(2,N) V(1,1) R o w 1 V(1,2) V(1,N) 2. Regions into Tables 1. Cache into Schema 3. Key/Value into Table Row 4. Column Types inferred from PDX fields
  16. Federation: Query Mapping

  17. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SQL for NoSQL? • Extended Relational Algebra - already present in most NoSql data • Relational Expression Optimization – Desirable but hard to implement 17
  18. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Query = Relational Algebra 18 SELECT b.totalPrice, c.firstName FROM BookOrder AS b INNER JOIN BookCustomer AS c ON b.customerNumber = c.customerNumber WHERE b.totalPrice > 0 … } Scan Scan Join Filter Project BookCustomer [c] BookOrder [b] (on customerNumber) (b.totalPrice > 0) (c.firstName, b.totalPrice) Scan Scan Join Project BookCustomer [c] BookOrder [b] (on customerNumber) (totalPrice > 0) (c.firstName, b.totalPrice) Project (firstName, customerNumber) Filter (totalPrice, customerNumber) Project <<Optimization>> <<SQL Parsing>>
  19. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ “Native” Geode Operations 19 Relational Operation Supported on Geode SORT/LIMIT YES (no Offset) PROJECT YES FILTER YES AGGREGATE SUM, AVG, MAX, MIN, COUNT, GROUP BY, DISTINCT JOIN NO
  20. SQL Adapter Framework

  21. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Calcite • Java Framework • SQL Query Parser, Validator and Optimizer(s) • JDBC drivers - local and remote • Agnostic to data storage and processing • Powered by Calcite: 21
  22. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Geode Adapter 22 Geode API and OQL SQL/JDBC Convert relational expressions into OQL queries Geode Adapter (Geode Client) Geode Server Geode Server Geode Server Data Data Data Push down all supported expressions to Geode OQL and fall back to Calcite Enumerable for the rest Enumerable Adapter Apache Calcite Parse SQL into relational expression and optimizes Calcite-Geode-Embedded(e.g.JDBC.jar)
  23. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Geode Concepts Cache Server Cache Region 1 Region N Val Key v1 k1 v2 k2 Cache - In-memory collection of Regions Region - consistent, distributed Map (key-value), Partitioned or Replicated CacheServer – process connected to the distributed system with created Cache Client Locator Client –read and modify the content of the distributed system Locator – tracks system members and provides membership information … Listeners Functions Functions – distributed, concurrent data processing Listener – event handler. Registers for one or more events and notified when they occur
  24. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Key/Value vs. OQL “to get extreme scale we use GemFire for what it does best: key-value storage. Then, when we can’t design our way to using what GemFire does best, we can use OQL and treat it like an object data-base” Mike Stolz, “Scaling Data Services with Pivotal GemFire®”, 2018
  25. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Query Execution Flow 25 2. Parse SQL, convert to rel. expressions. Validate and Optimize them 3. Start building a physical plan from the relation expressions 4. Implement the Geode relations and encode them as Expression tree 5. Pass the Expression tree to the Interpreter to generate Java code 6. Generate and Compile a Binder instance that on ‘bind()’ call runs Geodes’ query method 1. On new SQL query JDBC delegates to Prepare to prepare the query execution 7. JDBC uses the newly compiled Binder to perform the query on the Geode Cluster
  26. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Query Mapping - Example 26 SELECT author FROM BookMaster WHERE retailCost > 0 LIMIT 1 … } SELECT author FROM BookMaster WHERE retailCost > 0 LIMIT 1 OFFSET 1 … }
  27. Demo

  28. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ DB Tools Integration 28 Access Apache Geode / GemFire over SQL/JDBC Explore Geode & GemFire Data with IntelliJ SQL/Database tool
  29. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Analytics 29 Advanced Apache Geode Data Analytics with Apache Zeppelin over SQL/JDBC
  30. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data Federation 30 {Geode|Greenplum|...} Data Federation
  31. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spatial (OpenGIS /PostGIS) 31
  32. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SQL Stream (in progress...) 32 https://calcite.apache.org/docs/stream.html
  33. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SQL Stream (2) 33 • How many rows come out for each row in? • Does each incoming value appear in one total, or more? • What defines the “window”, the set of rows that contribute to a given output row? • Is the result a stream or a relation? • tumbling window (GROUP BY) • hopping window (multi GROUP BY) • sliding window (window functions) • cascading window (window functions)
  34. Usability?

  35. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Pros & Cons • Ad-hoc data exploration • JDBC integration with 3rd party tools • Data Federation, correlate Geode with other data sources • SQL Streaming as CQ++ • No-intrusive and extensible approach • What happened with SQLFire? • Geode == Transactional System! SQL+Geode <> Analytical System! • Key/Value vs. Full Scan • Overhead: SQL > OQL > Functions • Data at Rest (Table) vs. Data at Motion (Stream)
  36. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Future work • Add Geode Adapter to Calcite Project: [CALCITE-2059] – VOTE! ;) • Table Materialization • SQL Streaming (https://calcite.apache.org/docs/stream.html) • Pre-defined vs. Inferred Schema • “Native” JOIN operation implementation • Geode Indexes and Query statistics as Calcite metadata. Hint Indexes • Mixing Key/Value and OQL data access • Explore Geode functions as an alternative to OQL • Explore TX support • Performance Benchmark • Calcite Spatial (https://calcite.apache.org/docs/spatial.html)
  37. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Credits to Dan Baskette for inspiring this work and to Roman Shaposhnik for helping spread the word 37
  38. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ References 38 [1 ] Apache Geode: http://geode.apache.org [2] Geode Object Query Language (OQL): http://bit.ly/2BfXmNR [3] Apache Calcite: https://calcite.apache.org [4] Calcite - supported SQL language: https://calcite.apache.org/docs/reference.html [5] Apache Geode Adapter for Apache Calcite: https://github.com/tzolov/calcite/tree/geode-1.3 [6] Relational Algebra Operations: http://bit.ly/2zX8cMc [7] Apache Phoenix - http://phoenix.apache.org [8] Big Data is Four Different Problems, 2016, M.Stonebraker (video): http://bit.ly/2jDmKpq [9] Architecture of a Database System, 2007 (J.M. Hellerstein, M. Stonebraker, J. Hamilton): http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf [10] One Size Fits All (Michael Stonebraker): http://ieeexplore.ieee.org/document/1410100/?denied
  39. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software,

    Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Bonus: Geode Adapter Implementation 39 Common interface for all MyAdapter Relation Expressions. Provides implementation callback method called as part of physical plan implementation … … <<instance of>> <<create>> Can convert queries in Expression myQuery() implements the call to your DB It is called by the auto generated code. It must return an Enumberable instance Registers all MyAdapter Rules <<create>> operands: (RelNode.class, MyAdapterConvention, EnumerableConvention) … <<create on match >> <<create on match >> Recursively call the implement on each MyAdapter Relation Expression Encode the myQuery(params) call as Expressions
  40. Learn More. Stay Connected. Vote for [CALCITE-2059] and don’t miss:

    Simplifying Apache Geode with Spring Data Exploring Data-Driven, Cognitive Capabilities in Pivotal Cloud Foundry Orchestrating Data Microservices with Spring Cloud Data Flow 40 #springone @s1p