Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka Meetup DUS: A Zero Code Tracking Pipeline with Apache Kafka at METRO Markets

René Kerner
September 25, 2019

Kafka Meetup DUS: A Zero Code Tracking Pipeline with Apache Kafka at METRO Markets

Short before the launch of the METRO Markets' marketplace at https://www.metro.de/marktplatz/ we needed to build an event and user tracking platform to log events relevant to monitor the business KPIs of our site.
Confronted with having also frontend JS applications as well as several backend APIs and services, we wanted to find a quick and elegant solution to reliably track business events on this distributed system.
We summarized the problems that might occur und found a way using Apache Kafka
and tools from the Kafka ecosystem, like Confluent REST Proxy
and Confluent's Kafka Connect HTTP Sink, to easily build a pipeline that gathers all tracking messages in the distributed log and forward them to our analytics providers without writing any line of code.
The slides show the problems and ideas.
The demo is available on GitHub: https://github.com/rk3rn3r/kafka-meetup-2019-09

René Kerner

September 25, 2019
Tweet

More Decks by René Kerner

Other Decks in Programming

Transcript

  1. Up Next: "A Zero Code Tracking Pipeline with Apache Kafka at METRO Markets"
    by René Kerner
    (Software Engineer/Architect at METRO Markets)
    https://lparchive.org/The-Secret-of-Monkey-Island/Update%201/1-somi_001.gif

    View Slide

  2. René Kerner, Software-Engineer/Architect
    17
    @rk3rn3r
    Software
    Emgineering
    ~10 yrs
    ~3 yrs
    since
    08/2019
    06/2018 -
    07/2019
    2011 - 2018

    View Slide

  3. 18
    "A Zero Code Tracking Pipeline with Apache
    Kafka at METRO Markets"
    18

    View Slide

  4. The Problem?
    19
    - Tracking users reliably in a
    distributed environment
    - Different programming
    languages
    - Ordering and duplicate
    handling
    - Batching

    View Slide

  5. More Problems
    20
    - Tracking users reliably in a
    distributed environment
    - Different programming
    languages
    - Ordering and duplicate
    handling
    - Batching
    - Rate limits
    - Client side connections
    down
    X X

    View Slide

  6. More Problems
    21
    - Tracking users reliably in a
    distributed environment
    - Different programming
    languages
    - Ordering and duplicate
    handling
    - Batching
    - Rate limits
    - Client side connections
    down
    - Server side connections
    down
    - Different/additional
    providers
    X X

    View Slide

  7. Solution
    22
    - Tracking users reliably in a
    distributed environment
    - Different programming
    languages
    - Ordering and duplicate
    handling
    - Batching
    - Rate limits
    - Client side connections
    down
    - Server side connections
    down
    - Different/additional
    providers
    → careful about retries
    → send data to your own site first
    → HTTP as common interface / API
    → use Apache Kafka

    View Slide

  8. Idea
    23

    View Slide

  9. Confluent REST Proxy
    24
    - RESTful HTTP interface / API to a Kafka cluster
    - Read cluster metadata (brokers, topics, partitions, and configs)
    - Producer HTTP API to send message to topic/s
    - Consumer HTTP API to read messages from topic/s
    - Supports different data formats: JSON, raw bytes encoded with base64,
    or JSON-encoded Avro (using different Content-Type headers)
    - Scalable to multiple instances, including HA/high-availability scenarios
    → set unique id (group id) for every instance
    - Docs: https://docs.confluent.io/current/kafka-rest/index.html
    - Src on GitHub: https://github.com/confluentinc/kafka-rest
    - OSS: Confluent Community License Agreement 1.0

    View Slide

  10. Confluent REST Proxy
    25
    - RESTful HTTP interface / API to a Kafka cluster
    - Read cluster metadata (brokers, topics, partitions, and configs)
    - Producer HTTP API to send message to topic/s
    - Consumer HTTP API to read messages from topic/s
    - Supports different data formats: JSON, raw bytes encoded with base64,
    or JSON-encoded Avro (using different Content-Type headers)
    - Scalable to multiple instances, including HA/high-availability scenarios
    → set unique id (consumer group id) for every instance
    - Docs: https://docs.confluent.io/current/kafka-rest/index.html
    - Src on GitHub: https://github.com/confluentinc/kafka-rest
    - OSS: Confluent Community License Agreement 1.0
    POST /topics/test HTTP/1.1
    Host: kafkaproxy.metro-markets.local
    Content-Type: application/vnd.kafka.binary.v2+json
    {
    "records": [
    {
    "key": "a2V5",
    "value": "Y29uZmx1ZW50"
    },
    {
    "value": { “field1”: “my-data”, “field2”: 12345 }
    }
    ]
    }

    View Slide

  11. Confluent HTTP Sink Connector (Kafka Connect)
    26
    - Integrates Kafka with an API via HTTP or HTTPS
    - Consumes records from Kafka topic/s
    - Converts each record to a String before sending it in the request body
    → will break JSON when using Single Message Transform (SMT)
    - Sends the message value or it’s fields to an HTTP endpoint
    - Kafka Connect Distributed (KCD) REST API to configure and manage
    - Scalable to multiple instances, including HA/high-availability scenarios
    → using Kafka Connect Distributed (KCD)
    - Docs: https://docs.confluent.io/current/connect/kafka-connect-http/index.html
    - Proprietary: Confluent Community License Agreement 1.0

    View Slide

  12. Confluent HTTP Sink Connector (Kafka Connect)
    27
    - Integrates Kafka with an API via HTTP or HTTPS
    - Consumes records from Kafka topic/s
    - Converts each record to a String before sending it in the request body
    → will break JSON when using Single Message Transform (SMT)
    - Sends the message value or it’s fields to an HTTP endpoint
    - REST API to configure and manage with Kafka Connect Distributed (KCD)
    - Scalable to multiple instances, including HA/high-availability scenarios
    → using Kafka Connect Distributed (KCD)
    - Docs: https://docs.confluent.io/current/connect/kafka-connect-http/index.html
    - Proprietary: Confluent Community License Agreement 1.0
    - OSS Kafka Connect HTTP Sink Connectors available
    - e.g. from thomaskwscott
    - Config-compatible to the official Confluent HTTP Sink
    Connector
    - Docs:
    https://thomaskwscott.github.io/kafka-connect-http/sink_connector.html
    - Src on GitHub:
    https://github.com/thomaskwscott/kafka-connect-http
    - METRO Markets might release one on GitHub soon …
    - GZIP compression
    - Proper JSON, also when using SMTs

    View Slide

  13. Back to the Idea
    28

    View Slide

  14. Idea
    29
    No Code!
    Only Config!

    View Slide

  15. 30
    DEMO TIME
    30

    View Slide

  16. Additional Benefits
    31

    View Slide

  17. We are hiring!
    32
    https://www.metro-markets.de/careers/

    View Slide

  18. 33
    Thank you
    for your attention!
    33

    View Slide

  19. Q&A
    34
    http://indigo.ie/~rdshiels/monkey/guyswing.gif

    View Slide