Writing an Interactive Interface for SQL on Flink

Writing an interactive interface for SQL on Flink How and
why we created SQLStreamBuilder—and the lessons learned along the way Kenny Gorman Co-Founder and CEO www.eventador.io 2019 Flink Forward Berlin

Background and motivations • Eventador.io has offered a managed Flink
runtime for a few years now. We started to see some customer patterns emerge. • The state of the art today is to write Flink jobs in Java or Scala using the DataStream/Set API and/or the Table API’s. • While powerful, the time and expertise needed isn’t trivial. Adoption and time to market lags. • Teams are busy writing code. Completely swamped to be precise.

Why SQL anyway? • SQL is > 30 years old.
It’s massively useful for inspecting and reasoning about data. Everyone knows SQL. • It’s declarative, just ask for what you want to see. • It’s been extended to accommodate streaming constructs like windows (Flink/Calcite). • Streaming SQL never completes, it’s a query on boundless data. • It’s an amazing way to interact with streaming data.

Of workloads could be represented with SQL, and we plan
to grow that. Require more complex logic best represented in Java/Scala. 80% 20%

What if we could go beyond simply building processors in
SQL - do it interactively, manage schema’s and make it all easy? Could building logic on streams be as productive and intuitive as using a database yet as scalable and powerful as Flink?

Eventador SQLStreamBuilder • Interactive SQL editor - create and submit
any Flink compatible SQL • Virtual Table Registry - source/sink + schema definition • Query Parser - Gives instant feedback • Job payload management - Builds job payloads • Flink runner - Takes the payload and runs the job • Delivered as a cloud service - in your AWS account

Feedback on SQL execution Where do I send results? Where
to run the job The SQL statement Sampling rather than a result-set A sample of results in browser

Schema management - Virtual Table Registry • SQL requires a
schema of typed columns - streams don’t have have to have this. • It’s common to use AVRO (easy to solve for) but also free-form JSON • Free form means - a total F**ing mess. • Sources - Kafka/Kinesis (soon) • Sinks - Kafka,S3, JDBC, ELK (soon)

SQLStreamBuilder Components • Interactive SQL interface ◦ Handles query creation
and submission. ◦ Handles feedback from SQLIO ◦ Interface to build queries, sources and sinks ◦ Python + Vue.js ◦ Results are sampled back to interface • SQL engine (SQLIO) ◦ Parse incoming statements ◦ Map data sources/sinks ◦ Parse schema (Schema Registry+AVRO / JSON) ◦ Build POJOs ◦ Submit payload to runner (Flink) ◦ Java • Virtual Table Registry ◦ Creation of schema for streams ◦ AVRO + JSON ◦ Python

SQLStreamBuilder (con’t) • Job Management Interface ◦ Stop/Start/Edit/etc ◦ Python
+ Vue.js ◦ Uses Flink APIs • Builder ◦ Handles creation of assets via K8’s ◦ Python ◦ PostgreSQL backend ◦ Kubernetes orchestration • Flink runner ◦ Run jobs on Flink 1.8.2 ◦ Kubernetes orchestration ◦ Any Flink compatible SQL statement

Query Lifecycle - Parse HTTPS POST { “schema”: { },
“topic”: “myTopic”, “virtualTable: “myTable”, “sqlStmt”: “SELECT ….” } SQLIO RESULT (if error) { “errorMsg”: “unable to find table myTable” } final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ... Table parse = mytopic .select() .sort() ... - Validate SQL against Flink/Calcite - JSON -> AVRO - AVRO Schema - Generate Classes (POJO’s) SQL console

Query Lifecycle - Execute SQLIO Apache Kafka / Socket.io SQL
console Column Column Column Value Value Value - If class exists - class, method, params .connect( new Kafka() .version("0.11") .topic("...") .sinkPartitionerXX result.writeToSink(..); env.execute(..); - Enhanced schema typing - Enhanced feedback/logging - Sends base64 encoded payload to Flink Job SAMPLE THE DATA TO USER

SQL join streams from multiple clusters/types Write to multiple types
of sinks, building complex processing pipelines Aggregate data before pushing to expensive/slow database endpoints Conditionally write to multiple S3 buckets

Building Processing Environments SELECT * FROM sensors JOIN account_info ON
... SELECT sensorid, max(temp) FROM stream GROUP BY sensorid, tumble(..) SELECT sensorid, region FROM stream WHERE region IN [...] s3://xxx/yyy s3://xxx/yyy sms SELECT * FROM table WHERE user_selected_thing = ‘foo’; SELECT sensorid, message FROM stream WHERE is_alert = ‘t’ Data Science ML Team(s) SnowFlakeDB or other Data Warehouse

Javascript User Functions - Introduced Today function ICAO_lookup(icao) { try
{ var c = new java.net.URL('http://tornado.beebe.cc/' + icao).openConnection(); c.requestMethod='GET'; var reader = new java.io.BufferedReader(new java.io.InputStreamReader(c.inputStream)); return reader.readLine(); } catch(err) { return "Unknown: " + err; } } ICAO_lookup($p0);

Whats next?

We are hiring Number of changes for community evaluation

Whats Next? • Read Consistent (Snapshot) Sink • Auto-detect and
type schema • SQL API • Intelligent Auto-scale • AWS Spot instance support/management

Streams Are The Database

Thank You [email protected]

Writing an Interactive Interface for SQL on Flink

Writing an Interactive Interface for SQL on Flink

Eventador

More Decks by Eventador

Other Decks in Technology

Featured

Transcript