Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keep your Data Close and your Caches Hotter using Apache Kafka, KSQL and Connect

Keep your Data Close and your Caches Hotter using Apache Kafka, KSQL and Connect

Ricardo Ferreira

April 02, 2019
Tweet

More Decks by Ricardo Ferreira

Other Decks in Programming

Transcript

  1. Keep your Data Close and your Caches Hotter using Apache

    Kafka, Connect and KSQL Ricardo Ferreira, Developer Advocate @riferrei #KafkaSummit
  2. About Me: • Hi, my name is Ricardo Ferreira •

    Developer Advocate @ Confluent • Currently into Cloud & DevOps • Ex-Oracle, Red Hat, IONA Tech • https://riferrei.net @riferrei #KafkaSummit
  3. There are three parts in a airbag system: • The

    bag itself. • The sensors which tell the bag to inflate when there is a collision probability based on speediness. • The inflation system, which does combine two compounds [Sodium Azide (NaN3) and Potassium Nitrate (KNO3)] used to produce Nitrogen gas and inflate the bag. @riferrei #KafkaSummit What if the airbag deploys 30 seconds after the collision?
  4. December 6th, 2010: Commuter rail train hits elderly driver •

    70-year old lady hear on the news that there will be no commuter rail train on that day. • She tries to beat the train as its speed through the Groove Street, but there was no enough time to break. • Luckily she is still alive. @riferrei #KafkaSummit What if the information about the commuter rail train is outdated?
  5. APIs need to access data freely and easily • Data

    should never be treated as a scarce resource in applications • Latency should be kept as minimal to ensure a better user experience • Data should be not be static: keep the data fresh continuously • Find ways to handle large amounts of data without breaking the APIs @riferrei #KafkaSummit Cache API Read Write Read Write
  6. Caches can be either built-in or distributed • If data

    can fit into the API memory, then you should use built-in caches • Otherwise, you may need to use distributed caches for large sizes • Some cache implementations provides the best of both cases • For distributed caches, make sure to always find a good way to O(1) @riferrei #KafkaSummit Cache API Read Write Built-in Caches Cache API Distributed Caches Cache Cache Read Write
  7. Let’s Tweet the Song! 1. Access your Twitter account. 2.

    Use #KafkaSummit in your tweet. 3. The name of the song must be within brackets as shown below. @riferrei #KafkaSummit
  8. Application X-Ray: • Confluent Cloud Cluster • AWS and Terraform

    • Spring Boot Application • Apache Kafka Connect • Confluent KSQL • Redis Cache • AWS Lambda • Amazon Alexa @riferrei #KafkaSummit
  9. Application X-Ray: • Confluent Cloud Cluster • AWS and Terraform

    • Spring Boot Application • Apache Kafka Connect • Confluent KSQL • Redis Cache • AWS Lambda • Amazon Alexa @riferrei #KafkaSummit You can find the source-code of this application here:
  10. Caching Pattern: Refresh Ahead • Proactively updates the cache •

    Keep the entries always in-sync • Ideal for latency sensitive cases • Ideal when data read is costly • It may need initial data loading @riferrei #KafkaSummit Kafka Connect Cache Kafka Connect API
  11. Caching Pattern: Refresh Ahead / Adapt • Proactively updates the

    cache • Keep the entries always in-sync • Ideal for latency sensitive cases • Ideal when data read is costly • It may need initial data loading @riferrei #KafkaSummit Kafka Connect Application Cache Kafka Connect Transform and adapt records before delivery Schema Registry for canonical models API
  12. Caching Pattern: Write Behind • Removes I/O pressure from app

    • Allows true horizontal scalability • Ensures ordering and persistence • Minimizes DB code complexity • Totally handles DB unavailability @riferrei #KafkaSummit Kafka Connect Application Cache Kafka Connect API
  13. Caching Pattern: Write Behind / Adapt • Removes I/O pressure

    from app • Allows true horizontal scalability • Ensures ordering and persistence • Minimizes DB code complexity • Totally handles DB unavailability @riferrei #KafkaSummit Kafka Connect Application Cache Kafka Connect Transform and adapt records before delivery Schema Registry for canonical models API
  14. Caching Pattern: Event Federation • Replicates data across regions •

    Keep multiple regions in-sync • Great to improve RPO and RTO • Handles lazy/slow networks well • Works well if its used along with Read-Through and Write-Through patterns. @riferrei #KafkaSummit Confluent Replicator <<MirrorMaker>>
  15. Kafka Connect support for In-Memory Caches • Connector for Redis

    is open and it is available in Confluent Hub • Connector for Memcached is open and it is available in Confluent Hub • Connectors for both GridGain and Apache Ignite implementations. • Connector for InfiniSpan is open and is maintained by Red Hat @riferrei #KafkaSummit Kafka Connect Kafka Connect Kafka Connect Kafka Connect
  16. Frameworks for other In-Memory Caches • Oracle provides HotCache from

    GoldenGate for Oracle Coherence • Hazelcast has the Jet framework, which provides support for Kafka • Pivotal GemFire (Apache Geode) has good support from Spring • Good news: you can always write your own sink using Connect API @riferrei #KafkaSummit Oracle GoldenGate Hazelcast Jet Spring Data Spring Kafka Connect Framework Any Cache
  17. Interested on DB CDC? Then meet Debezium! • Amazing CDC

    technology to pull data out from databases to Kafka • Works in a log level, which means true CDC implementation for your projects instead of record polling • Open-source maintained by Red Hat. Have broad support for many popular databases. • It is built on top of Kafka Connect @riferrei #KafkaSummit
  18. Support for Running Kafka Connect Servers • Run by yourself

    on BareMetal: https://kafka.apache.org/downloads https://www.confluent.io/download • IaaS on AWS or Google Cloud: https://github.com/confluentinc/ccloud-tools • Running using Docker Containers: https://hub.docker.com/r/confluentinc/cp-kafka- connect/ • Running using Kubernetes: https://github.com/confluentinc/cp-helm-chart https://www.confluent.io/confluent-operator/ @riferrei #KafkaSummit Kafka Connect