Quickly Build Kafka Stream ETL System with KSETL

KSETL • KSETL = Kafka Stream ETL – Extract, Transform,
Load stream on Kafka • Stream – Unbounded continuously generated data – User log – Sensor data

Why Stream Processing? • To process data with low latency
• For business requirements – Delivery status, order status for food delivery – Fraud detection in financial transaction • For better performance – Contents recommendation

Stream processing system • Batch processing system is not enough
– Daily à Hourly à Minuitely à Secondly? • Stream processing system – Reflect the latest data quickly

Building systems • Many stream processing systems are needed •
Common works – Write and debug programs – Build programs – Deploy programs – Monitor programs • Many data engineers do the similar works

KSETL • KSETL – Kafka Stream ETL for LINE –
Input and output are Kafka topics – (Kafka is widely used for streams in LINE) • Build stream processing systems easily – Let data engineers build their systems by themselves

Goal – Easiness • Express ETL logic easily – Introduce
SQL-like syntax • Build systems easily – Create ksqlDB clusters dynamically on k8s

Express ETL logic easily • For data engineers without programming
expertise – Every data engineer knows SQL • Prototyped using various SQL engines – ksqlDB, FlinkSQL, Spark structured streaming – Left join stream-stream – Query a table and write to Kafka topic

ksqlDB • Full features – Join – Window aggregation –
User Defined Function (UDF) • Based on Kafka Streams API – Easy to understand • Only for stream processing – No extra parts for batch processing • Everything on Kafka – Good Kafka team in LINE

Internals of ksqlDB Join p1 p2 Left topic p1 p2
Right topic p1 p2 Joined topic p1 p1 Join partition1 p2 p2 Join partition2 p1 p2 Left changelog topic Local state store Local state store p1 p2 Right changelog topic

Build systems easily • Create ksqlDB clusters dynamically – ODA
(On-Demand Applications) • Provide logging/monitoring facilities • Run queries against a ksqlDB cluster

KSETL ODA Architecture • Many ksqlDB clusters in a k8s

KSETL Logging/Monitoring

Summary Tradi&onal KSETL Language Java, Scala SQL Build Compile Interactive
shell Deploy CI/CD tools On-demand cluster Monitor Custom tools Prebuilt dashboards

Example system • AB test report – LINE runs AB
tests before releasing new features – Request logs from LINE server • 50k / sec logs at peak time – Event(impression, click) logs from LINE client – Find client reaction for request and aggregate – Stream join and windowed aggregation required

Prev. AB test report • Prev. system to join streams
– Event log and request log with the same key – Store event(impression,click) logs to Redis – Delay request logs and lookup Redis to implement join window

Window aggregation

Results • Simple architecture – No Redis to join two
streams • Fast release – Interactive development • Fast monitoring and update – Get a performance dashboard – Tune fast

Limits • KSETL depends on – ksqlDB – Company-wide Kafka
• ksqlDB – Some features are missed (Still in active development) – FlinkSQL may be an alternative • Company-wide Kafka – Good support for all Kafka in LINE – But dynamic topic creation is prohibited

Future works • Data import from Hive – Hive tables
to enrich Kafka topics • Enhancing query deployment – Better way for executing query scripts

Thank you

Quickly Build Kafka Stream ETL System with KSETL

Quickly Build Kafka Stream ETL System with KSETL

LINE DEVDAY 2021

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Featured

Transcript

KSETL • KSETL = Kafka Stream ETL – Extract, Transform,

Why Stream Processing? • To process data with low latency

Stream processing system • Batch processing system is not enough

Building systems • Many stream processing systems are needed •

KSETL • KSETL – Kafka Stream ETL for LINE –

Goal – Easiness • Express ETL logic easily – Introduce

Express ETL logic easily • For data engineers without programming

ksqlDB • Full features – Join – Window aggregation –

Internals of ksqlDB Join p1 p2 Left topic p1 p2

Build systems easily • Create ksqlDB clusters dynamically – ODA

KSETL ODA Architecture • Many ksqlDB clusters in a k8s

KSETL Logging/Monitoring

Summary Tradi&onal KSETL Language Java, Scala SQL Build Compile Interactive

Example system • AB test report – LINE runs AB

Prev. AB test report • Prev. system to join streams

Join

Window aggregation

Results • Simple architecture – No Redis to join two

Limits • KSETL depends on – ksqlDB – Company-wide Kafka

Future works • Data import from Hive – Hive tables

Thank you