Keep your Data Close and your Caches Hotter using Apache Kafka, KSQL and Connect

Keep your Data Close and your Caches Hotter using Apache Kafka, KSQL and Connect

58526109f97b59d1b96cd8e700da44eb?s=128

Ricardo Ferreira

April 02, 2019
Tweet

Transcript

  1. Keep your Data Close and your Caches Hotter using Apache

    Kafka, Connect and KSQL Ricardo Ferreira, Developer Advocate @riferrei #KafkaSummit
  2. About Me: • Hi, my name is Ricardo Ferreira •

    Developer Advocate @ Confluent • Currently into Cloud & DevOps • Ex-Oracle, Red Hat, IONA Tech • https://riferrei.net @riferrei #KafkaSummit
  3. Data is only useful if it is Fresh and Contextual

  4. There are three parts in a airbag system: • The

    bag itself. • The sensors which tell the bag to inflate when there is a collision probability based on speediness. • The inflation system, which does combine two compounds [Sodium Azide (NaN3) and Potassium Nitrate (KNO3)] used to produce Nitrogen gas and inflate the bag. @riferrei #KafkaSummit What if the airbag deploys 30 seconds after the collision?
  5. December 6th, 2010: Commuter rail train hits elderly driver •

    70-year old lady hear on the news that there will be no commuter rail train on that day. • She tries to beat the train as its speed through the Groove Street, but there was no enough time to break. • Luckily she is still alive. @riferrei #KafkaSummit What if the information about the commuter rail train is outdated?
  6. Caches can be a Solution for Data that is Fresh

  7. APIs need to access data freely and easily • Data

    should never be treated as a scarce resource in applications • Latency should be kept as minimal to ensure a better user experience • Data should be not be static: keep the data fresh continuously • Find ways to handle large amounts of data without breaking the APIs @riferrei #KafkaSummit Cache API Read Write Read Write
  8. Caches can be either built-in or distributed • If data

    can fit into the API memory, then you should use built-in caches • Otherwise, you may need to use distributed caches for large sizes • Some cache implementations provides the best of both cases • For distributed caches, make sure to always find a good way to O(1) @riferrei #KafkaSummit Cache API Read Write Built-in Caches Cache API Distributed Caches Cache Cache Read Write
  9. Demo

  10. Let’s Tweet the Song! 1. Access your Twitter account. 2.

    Use #KafkaSummit in your tweet. 3. The name of the song must be within brackets as shown below. @riferrei #KafkaSummit
  11. Application X-Ray: • Confluent Cloud Cluster • AWS and Terraform

    • Spring Boot Application • Apache Kafka Connect • Confluent KSQL • Redis Cache • AWS Lambda • Amazon Alexa @riferrei #KafkaSummit
  12. Application X-Ray: • Confluent Cloud Cluster • AWS and Terraform

    • Spring Boot Application • Apache Kafka Connect • Confluent KSQL • Redis Cache • AWS Lambda • Amazon Alexa @riferrei #KafkaSummit You can find the source-code of this application here:
  13. Caching Patterns

  14. Caching Pattern: Refresh Ahead • Proactively updates the cache •

    Keep the entries always in-sync • Ideal for latency sensitive cases • Ideal when data read is costly • It may need initial data loading @riferrei #KafkaSummit Kafka Connect Cache Kafka Connect API
  15. Caching Pattern: Refresh Ahead / Adapt • Proactively updates the

    cache • Keep the entries always in-sync • Ideal for latency sensitive cases • Ideal when data read is costly • It may need initial data loading @riferrei #KafkaSummit Kafka Connect Application Cache Kafka Connect Transform and adapt records before delivery Schema Registry for canonical models API
  16. Caching Pattern: Write Behind • Removes I/O pressure from app

    • Allows true horizontal scalability • Ensures ordering and persistence • Minimizes DB code complexity • Totally handles DB unavailability @riferrei #KafkaSummit Kafka Connect Application Cache Kafka Connect API
  17. Caching Pattern: Write Behind / Adapt • Removes I/O pressure

    from app • Allows true horizontal scalability • Ensures ordering and persistence • Minimizes DB code complexity • Totally handles DB unavailability @riferrei #KafkaSummit Kafka Connect Application Cache Kafka Connect Transform and adapt records before delivery Schema Registry for canonical models API
  18. Caching Pattern: Event Federation • Replicates data across regions •

    Keep multiple regions in-sync • Great to improve RPO and RTO • Handles lazy/slow networks well • Works well if its used along with Read-Through and Write-Through patterns. @riferrei #KafkaSummit Confluent Replicator <<MirrorMaker>>
  19. Kafka Connect Implementation Strategies

  20. Kafka Connect support for In-Memory Caches • Connector for Redis

    is open and it is available in Confluent Hub • Connector for Memcached is open and it is available in Confluent Hub • Connectors for both GridGain and Apache Ignite implementations. • Connector for InfiniSpan is open and is maintained by Red Hat @riferrei #KafkaSummit Kafka Connect Kafka Connect Kafka Connect Kafka Connect
  21. Frameworks for other In-Memory Caches • Oracle provides HotCache from

    GoldenGate for Oracle Coherence • Hazelcast has the Jet framework, which provides support for Kafka • Pivotal GemFire (Apache Geode) has good support from Spring • Good news: you can always write your own sink using Connect API @riferrei #KafkaSummit Oracle GoldenGate Hazelcast Jet Spring Data Spring Kafka Connect Framework Any Cache
  22. Interested on DB CDC? Then meet Debezium! • Amazing CDC

    technology to pull data out from databases to Kafka • Works in a log level, which means true CDC implementation for your projects instead of record polling • Open-source maintained by Red Hat. Have broad support for many popular databases. • It is built on top of Kafka Connect @riferrei #KafkaSummit
  23. Support for Running Kafka Connect Servers • Run by yourself

    on BareMetal: https://kafka.apache.org/downloads https://www.confluent.io/download • IaaS on AWS or Google Cloud: https://github.com/confluentinc/ccloud-tools • Running using Docker Containers: https://hub.docker.com/r/confluentinc/cp-kafka- connect/ • Running using Kubernetes: https://github.com/confluentinc/cp-helm-chart https://www.confluent.io/confluent-operator/ @riferrei #KafkaSummit Kafka Connect
  24. 25 Please Stay in Touch: @riferrei riferrei riferrei ricardo@confluent.io https://riferrei.net

    https://cnfl.io/slack
  25. None