Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing an Interactive Interface for SQL on Flink

Avatar for Eventador Eventador
October 08, 2019

Writing an Interactive Interface for SQL on Flink

This presentation goes into detail on how and why Eventador created SQLStreamBuilder for easy streaming SQL—and the lessons learned along the way.

This presentation was given by Eventador CEO and Co-founder Kenny Gorman at Flink Forward Europe 2019.

Avatar for Eventador

Eventador

October 08, 2019
Tweet

More Decks by Eventador

Other Decks in Technology

Transcript

  1. Writing an interactive interface for SQL on Flink How and

    why we created SQLStreamBuilder—and the lessons learned along the way Kenny Gorman Co-Founder and CEO www.eventador.io 2019 Flink Forward Berlin
  2. Background and motivations • Eventador.io has offered a managed Flink

    runtime for a few years now. We started to see some customer patterns emerge. • The state of the art today is to write Flink jobs in Java or Scala using the DataStream/Set API and/or the Table API’s. • While powerful, the time and expertise needed isn’t trivial. Adoption and time to market lags. • Teams are busy writing code. Completely swamped to be precise.
  3. Why SQL anyway? • SQL is > 30 years old.

    It’s massively useful for inspecting and reasoning about data. Everyone knows SQL. • It’s declarative, just ask for what you want to see. • It’s been extended to accommodate streaming constructs like windows (Flink/Calcite). • Streaming SQL never completes, it’s a query on boundless data. • It’s an amazing way to interact with streaming data.
  4. Of workloads could be represented with SQL, and we plan

    to grow that. Require more complex logic best represented in Java/Scala. 80% 20%
  5. What if we could go beyond simply building processors in

    SQL - do it interactively, manage schema’s and make it all easy? Could building logic on streams be as productive and intuitive as using a database yet as scalable and powerful as Flink?
  6. Eventador SQLStreamBuilder • Interactive SQL editor - create and submit

    any Flink compatible SQL • Virtual Table Registry - source/sink + schema definition • Query Parser - Gives instant feedback • Job payload management - Builds job payloads • Flink runner - Takes the payload and runs the job • Delivered as a cloud service - in your AWS account
  7. Feedback on SQL execution Where do I send results? Where

    to run the job The SQL statement Sampling rather than a result-set A sample of results in browser
  8. Schema management - Virtual Table Registry • SQL requires a

    schema of typed columns - streams don’t have have to have this. • It’s common to use AVRO (easy to solve for) but also free-form JSON • Free form means - a total F**ing mess. • Sources - Kafka/Kinesis (soon) • Sinks - Kafka,S3, JDBC, ELK (soon)
  9. SQLStreamBuilder Components • Interactive SQL interface ◦ Handles query creation

    and submission. ◦ Handles feedback from SQLIO ◦ Interface to build queries, sources and sinks ◦ Python + Vue.js ◦ Results are sampled back to interface • SQL engine (SQLIO) ◦ Parse incoming statements ◦ Map data sources/sinks ◦ Parse schema (Schema Registry+AVRO / JSON) ◦ Build POJOs ◦ Submit payload to runner (Flink) ◦ Java • Virtual Table Registry ◦ Creation of schema for streams ◦ AVRO + JSON ◦ Python
  10. SQLStreamBuilder (con’t) • Job Management Interface ◦ Stop/Start/Edit/etc ◦ Python

    + Vue.js ◦ Uses Flink APIs • Builder ◦ Handles creation of assets via K8’s ◦ Python ◦ PostgreSQL backend ◦ Kubernetes orchestration • Flink runner ◦ Run jobs on Flink 1.8.2 ◦ Kubernetes orchestration ◦ Any Flink compatible SQL statement
  11. Query Lifecycle - Parse HTTPS POST { “schema”: { },

    “topic”: “myTopic”, “virtualTable: “myTable”, “sqlStmt”: “SELECT ….” } SQLIO RESULT (if error) { “errorMsg”: “unable to find table myTable” } final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ... Table parse = mytopic .select() .sort() ... - Validate SQL against Flink/Calcite - JSON -> AVRO - AVRO Schema - Generate Classes (POJO’s) SQL console
  12. Query Lifecycle - Execute SQLIO Apache Kafka / Socket.io SQL

    console Column Column Column Value Value Value - If class exists - class, method, params .connect( new Kafka() .version("0.11") .topic("...") .sinkPartitionerXX result.writeToSink(..); env.execute(..); - Enhanced schema typing - Enhanced feedback/logging - Sends base64 encoded payload to Flink Job SAMPLE THE DATA TO USER
  13. SQL join streams from multiple clusters/types Write to multiple types

    of sinks, building complex processing pipelines Aggregate data before pushing to expensive/slow database endpoints Conditionally write to multiple S3 buckets
  14. Building Processing Environments SELECT * FROM sensors JOIN account_info ON

    ... SELECT sensorid, max(temp) FROM stream GROUP BY sensorid, tumble(..) SELECT sensorid, region FROM stream WHERE region IN [...] s3://xxx/yyy s3://xxx/yyy sms SELECT * FROM table WHERE user_selected_thing = ‘foo’; SELECT sensorid, message FROM stream WHERE is_alert = ‘t’ Data Science ML Team(s) SnowFlakeDB or other Data Warehouse
  15. Javascript User Functions - Introduced Today function ICAO_lookup(icao) { try

    { var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection(); c.requestMethod='GET'; var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream)); return reader.readLine(); } catch(err) { return "Unknown: " + err; } } ICAO_lookup($p0);
  16. Whats Next? • Read Consistent (Snapshot) Sink • Auto-detect and

    type schema • SQL API • Intelligent Auto-scale • AWS Spot instance support/management