Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka Meetup DUS: A Zero Code Tracking Pipeline...

René Kerner
September 25, 2019

Kafka Meetup DUS: A Zero Code Tracking Pipeline with Apache Kafka at METRO Markets

Short before the launch of the METRO Markets' marketplace at https://www.metro.de/marktplatz/ we needed to build an event and user tracking platform to log events relevant to monitor the business KPIs of our site.
Confronted with having also frontend JS applications as well as several backend APIs and services, we wanted to find a quick and elegant solution to reliably track business events on this distributed system.
We summarized the problems that might occur und found a way using Apache Kafka
and tools from the Kafka ecosystem, like Confluent REST Proxy
and Confluent's Kafka Connect HTTP Sink, to easily build a pipeline that gathers all tracking messages in the distributed log and forward them to our analytics providers without writing any line of code.
The slides show the problems and ideas.
The demo is available on GitHub: https://github.com/rk3rn3r/kafka-meetup-2019-09

René Kerner

September 25, 2019
Tweet

More Decks by René Kerner

Other Decks in Programming

Transcript

  1. Up Next: "A Zero Code Tracking Pipeline with Apache Kafka

    at METRO Markets" by René Kerner (Software Engineer/Architect at METRO Markets) https://lparchive.org/The-Secret-of-Monkey-Island/Update%201/1-somi_001.gif
  2. The Problem? 19 - Tracking users reliably in a distributed

    environment - Different programming languages - Ordering and duplicate handling - Batching
  3. More Problems 20 - Tracking users reliably in a distributed

    environment - Different programming languages - Ordering and duplicate handling - Batching - Rate limits - Client side connections down X X
  4. More Problems 21 - Tracking users reliably in a distributed

    environment - Different programming languages - Ordering and duplicate handling - Batching - Rate limits - Client side connections down - Server side connections down - Different/additional providers X X
  5. Solution 22 - Tracking users reliably in a distributed environment

    - Different programming languages - Ordering and duplicate handling - Batching - Rate limits - Client side connections down - Server side connections down - Different/additional providers → careful about retries → send data to your own site first → HTTP as common interface / API → use Apache Kafka
  6. Confluent REST Proxy 24 - RESTful HTTP interface / API

    to a Kafka cluster - Read cluster metadata (brokers, topics, partitions, and configs) - Producer HTTP API to send message to topic/s - Consumer HTTP API to read messages from topic/s - Supports different data formats: JSON, raw bytes encoded with base64, or JSON-encoded Avro (using different Content-Type headers) - Scalable to multiple instances, including HA/high-availability scenarios → set unique id (group id) for every instance - Docs: https://docs.confluent.io/current/kafka-rest/index.html - Src on GitHub: https://github.com/confluentinc/kafka-rest - OSS: Confluent Community License Agreement 1.0
  7. Confluent REST Proxy 25 - RESTful HTTP interface / API

    to a Kafka cluster - Read cluster metadata (brokers, topics, partitions, and configs) - Producer HTTP API to send message to topic/s - Consumer HTTP API to read messages from topic/s - Supports different data formats: JSON, raw bytes encoded with base64, or JSON-encoded Avro (using different Content-Type headers) - Scalable to multiple instances, including HA/high-availability scenarios → set unique id (consumer group id) for every instance - Docs: https://docs.confluent.io/current/kafka-rest/index.html - Src on GitHub: https://github.com/confluentinc/kafka-rest - OSS: Confluent Community License Agreement 1.0 POST /topics/test HTTP/1.1 Host: kafkaproxy.metro-markets.local Content-Type: application/vnd.kafka.binary.v2+json { "records": [ { "key": "a2V5", "value": "Y29uZmx1ZW50" }, { "value": { “field1”: “my-data”, “field2”: 12345 } } ] }
  8. Confluent HTTP Sink Connector (Kafka Connect) 26 - Integrates Kafka

    with an API via HTTP or HTTPS - Consumes records from Kafka topic/s - Converts each record to a String before sending it in the request body → will break JSON when using Single Message Transform (SMT) - Sends the message value or it’s fields to an HTTP endpoint - Kafka Connect Distributed (KCD) REST API to configure and manage - Scalable to multiple instances, including HA/high-availability scenarios → using Kafka Connect Distributed (KCD) - Docs: https://docs.confluent.io/current/connect/kafka-connect-http/index.html - Proprietary: Confluent Community License Agreement 1.0
  9. Confluent HTTP Sink Connector (Kafka Connect) 27 - Integrates Kafka

    with an API via HTTP or HTTPS - Consumes records from Kafka topic/s - Converts each record to a String before sending it in the request body → will break JSON when using Single Message Transform (SMT) - Sends the message value or it’s fields to an HTTP endpoint - REST API to configure and manage with Kafka Connect Distributed (KCD) - Scalable to multiple instances, including HA/high-availability scenarios → using Kafka Connect Distributed (KCD) - Docs: https://docs.confluent.io/current/connect/kafka-connect-http/index.html - Proprietary: Confluent Community License Agreement 1.0 - OSS Kafka Connect HTTP Sink Connectors available - e.g. from thomaskwscott - Config-compatible to the official Confluent HTTP Sink Connector - Docs: https://thomaskwscott.github.io/kafka-connect-http/sink_connector.html - Src on GitHub: https://github.com/thomaskwscott/kafka-connect-http - METRO Markets might release one on GitHub soon … - GZIP compression - Proper JSON, also when using SMTs