Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL

Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL
PGConf UK 2018 3 Jul 2018 / Robin Moffatt @rmoff [email protected] https://speakerdeck.com/rmoff/

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,
and KSQL - PGConf UK 2018 2 • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle ACE Director & Dev Champion • Blogging : http://rmoff.net & http://cnfl.io/rmoff • Twitter: @rmoff • Geek stuff • Beer & Fried Breakfasts $ whoami https://speakerdeck.com/rmoff/

and KSQL - PGConf UK 2018 3 App App App App search Hadoop DWH monitoring security MQ MQ cache cache A bit of a mess…

and KSQL - PGConf UK 2018 4 The Streaming Platform KAFKA DWH Hadoop App App App App App App App App request-response messaging OR stream processing streaming data pipelines changelogs

and KSQL - PGConf UK 2018 5 Database offload → Analytics HDFS / S3 / BigQuery etc RDBMS CDC

and KSQL - PGConf UK 2018 6 Streaming ETL with Apache Kafka and KSQL order events customer customer orders Stream Processing RDBMS CDC

and KSQL - PGConf UK 2018 7 Real-time Event Stream Enrichment order events customer Stream Processing customer orders RDBMS <y> CDC

and KSQL - PGConf UK 2018 8 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> New App <x> CDC

and KSQL - PGConf UK 2018 9 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> HDFS / S3 / etc New App <x> CDC

and KSQL - PGConf UK 2018 10 KSQL Streaming ETL with Apache Kafka

and KSQL - PGConf UK 2018 11 Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources Sinks Amazon S3 syslog flat file CSV JSON

and KSQL - PGConf UK 2018 12 The Connect API of Apache Kafka® ✓ Fault tolerant and automatically load balanced ✓ Extensible API ✓ Single Message Transforms ✓ Part of Apache Kafka, included in  Confluent Open Source Reliable and scalable integration of Kafka with other systems – no coding required. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/ ✓ Centralized management and configuration ✓ Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3 ✓ Supports CDC ingest of events from RDBMS ✓ Preserves data schema

and KSQL - PGConf UK 2018 13 Integrating Postgres with Kafka Kafka Connect & Debezium Kafka Connect & JDBC Sink

and KSQL - PGConf UK 2018 14 Confluent Hub hub.confluent.io • Launched June 2018 • One-stop place to discover and download : • Connectors • Transformations • Converters

and KSQL - PGConf UK 2018 Declarative Stream Language Processing KSQL is a

and KSQL - PGConf UK 2018 KSQL is the Streaming SQL Engine for Apache Kafka

and KSQL - PGConf UK 2018 KSQL for Streaming ETL CREATE STREAM vip_actions AS   SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id   WHERE u.level = 'Platinum'; Joining, filtering, and aggregating streams of event data

and KSQL - PGConf UK 2018 KSQL for Anomaly Detection CREATE TABLE possible_fraud AS  SELECT card_number, count(*)  FROM authorization_attempts   WINDOW TUMBLING (SIZE 5 SECONDS)  GROUP BY card_number  HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds

and KSQL - PGConf UK 2018 KSQL for Real-Time Monitoring • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting

and KSQL - PGConf UK 2018 22 Kafka Connect Producer API Elasticsearch Kafka Connect { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } { "id": 3, "first_name": "Merilyn", "last_name": "Doughartie", "email": "[email protected]", "gender": "Female", "club_status": "platinum", "comments": "none" } Postgres Demo Time! Kafka Connect Postgres

and KSQL - PGConf UK 2018 23 Producer API { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } POOR_RATINGS Filter all ratings where STARS<3 CREATE STREAM POOR_RATINGS AS SELECT * FROM ratings WHERE STARS <3

and KSQL - PGConf UK 2018 24 Kafka Connect Producer API { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } { "id": 3, "first_name": "Merilyn", "last_name": "Doughartie", "email": "[email protected]", "gender": "Female", "club_status": "platinum", "comments": "none" } RATINGS_WITH_CUSTOMER_DATA Join each rating to customer data UNHAPPY_PLATINUM_CUSTOMERS Filter for just PLATINUM customers CREATE STREAM UNHAPPY_PLATINUM_CUSTOMERS AS SELECT * FROM RATINGS_WITH_CUSTOMER_DATA WHERE STARS < 3

and KSQL - PGConf UK 2018 25 Kafka Connect Producer API { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } { "id": 3, "first_name": "Merilyn", "last_name": "Doughartie", "email": "[email protected]", "gender": "Female", "club_status": "platinum", "comments": "none" } RATINGS_WITH_CUSTOMER_DATA Join each rating to customer data RATINGS_BY_CLUB_STATUS_1MIN Aggregate per-minute by CLUB_STATUS CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) FROM RATINGS_WITH_CUSTOMER_DATA WINDOW TUMBLING (SIZE 1 MINUTES) GROUP BY CLUB_STATUS;

and KSQL - PGConf UK 2018 26 Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free! Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data  Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL

and KSQL - PGConf UK 2018 27 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle

@rmoff [email protected] http://cnfl.io/slack https://www.confluent.io/download/

and KSQL - PGConf UK 2018 29 • Postgres integration into Kafka • http://debezium.io/docs/connectors/postgresql/ • https://www.simple.com/engineering/a-change-data-capture-pipeline-from-postgresql-to-kafka • https://www.slideshare.net/JeffKlukas/postgresql-kafka-the-delight-of-change-data-capture • https://blog.insightdatascience.com/from-postgresql-to-redshift-with-kafka-connect-111c44954a6a • Streaming ETL • Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures Recording & Slides • Look Ma, no Code! Building Streaming Data Pipelines with Apache Kafka and KSQL • Steps to Building a Streaming ETL Pipeline with Apache Kafka and KSQL Recording & Slides • https://www.confluent.io/blog/ksql-in-action-real-time-streaming-etl-from-oracle-transactional-data • https://github.com/confluentinc/ksql/ Useful links

and KSQL - PGConf UK 2018 30 • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / [email protected]) can help with introductions on a given sales op Resources #EOF

Streaming ETL in Practice with PostgreSQL, Apac...

Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL

Robin Moffatt

More Decks by Robin Moffatt

Other Decks in Technology

Featured

Transcript

Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff [email protected] http://cnfl.io/slack https://www.confluent.io/download/

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,

@rmoff / Streaming ETL in Practice with PostgreSQL, Apache Kafka,