$30 off During Our Annual Pro Sale. View Details »

Kafka For Rubyists - Advanced Karafka

Kafka For Rubyists - Advanced Karafka

karol.galanciak

February 05, 2021
Tweet

More Decks by karol.galanciak

Other Decks in Programming

Transcript

  1. Advanced Karafka

    View Slide

  2. Karafka hooks/events
    - Based on dry-monitor
    - You can subscribe to a lot of events to add
    logging, instrumentation, error handling with
    external services, extend logic etc.

    View Slide

  3. Karafka hooks/events
    # 1.
    Karafka.monitor.subscribe("sync_producer.call.retry") do |event|
    do_something_with_the_event(event)
    end
    # 2.
    Karafka.monitor.subscribe(ExampleListener)
    class ExampleListener
    def self.on_sync_producer_call_retry(_event)
    end
    end

    View Slide

  4. Karafka hooks/events - supported events
    - params.params.deserialize
    - params.params.deserialize.error
    - connection.listener.before_fetch_loop
    - connection.listener.fetch_loop
    - connection.listener.fetch_loop.error
    - connection.client.fetch_loop.error
    - connection.batch_delegator.call
    - connection.message_delegator.call
    - fetcher.call.error
    - backends.inline.process
    - process.notice_signal
    - consumers.responders.respond_with
    - async_producer.call.error
    - async_producer.call.retry
    - sync_producer.call.error
    - sync_producer.call.retry
    - app.initializing
    - app.initialized
    - app.running
    - app.stopping
    - app.stopping.error

    View Slide

  5. Waterdrop
    - Standalone gem used by Karafka for publishing
    messages
    - You can use Waterdrop directly, but it’s better to
    use Responders as they provide some extras
    and are more convenient to use
    - There is already version 2.0, although current
    Karafka version (1.4) uses 1.4 version of
    Waterdrop, so keep that in mind when reading
    docs

    View Slide

  6. Waterdrop
    #sync
    WaterDrop::SyncProducer.call({ user_id:
    1 }.to_json, topic: "users", key: "user-1",
    partition_key: "1")
    #async
    WaterDrop::AsyncProducer.call({ user_id:
    1 }.to_json, topic: "users")

    View Slide

  7. Async Producers
    - Non-blocking - the operations involving
    publishing something to Kafka can get some
    extra performance gain
    - Somewhat protective against Kafka being
    unavailable
    - Could lead to loss of messages (some of them
    might no be published at all)

    View Slide

  8. Async Responders
    class ExampleResponder < ApplicationResponder
    topic :sync_topic
    topic :async_topic, async: true
    end

    View Slide

  9. Serializers
    - Easily customisable when using Responders
    - By default JSON Serializer is used
    - Requires equivalent Deserializer on
    Consumer’s side

    View Slide

  10. Example XML Serializer
    class KarafkaResponderXmlSerializer
    def self.call(object)
    object.to_xml
    end
    end
    class ExampleResponder < ApplicationResponder
    topic :users, serializer: KarafkaResponderXmlSerializer
    def respond(user)
    respond_to :users, user
    end
    end

    View Slide

  11. Deserializers
    - Configurable when declaring a topic on the
    consumer’s side
    - By default JSON Deserializer is used

    View Slide

  12. Deserializers
    class KarkafkaExampleXmlDeserializer
    def self.call(params)
    Hash.from_xml(params.raw_payload)
    end
    end
    KarafkaApp.routes.draw do
    topic :users do
    consumer UserConsumer
    deserializer KarkafkaExampleXmlDeserializer
    end
    end

    View Slide

  13. Responding from consumers
    class PagesConsumer < ApplicationConsumer
    def consume
    respond_with page_id: 1
    end
    end
    class PagesResponder < ApplicationResponder
    topic :pages_from_consumer
    def respond(payload_with_page_id)
    respond_to :pages_from_consumer, payload_with_page_id
    end
    end

    View Slide

  14. Responding from consumers
    KarafkaApp.consumer_groups.draw do
    consumer_group :group_for_kafka_example do
    batch_fetching true
    topic :pages do
    consumer PagesConsumer
    responder PagesResponder
    batch_consuming true
    end
    end
    end

    View Slide

  15. Testing Karafka - consumers
    - Dedicated gem for testing consumers:
    “karafka-testing”
    - It brings 2 helpers:
    - karafka_consumer_for
    - publish_for_karafka

    View Slide

  16. Testing Karafka - consumers
    RSpec.configure do |config|
    config.include Karafka::Testing::RSpec::Helpers
    end
    RSpec.describe UsersConsumer do
    subject(:consumer) { karafka_consumer_for(:users) }
    before do
    publish_for_karafka({ "user_id" => 1 }.to_json)
    end
    it "does some stuff" do
    # some potential mock
    consumer.consume
    # do some assertion here
    end
    end

    View Slide

  17. Testing Karafka - responders
    WaterDrop.setup do |config|
    config.deliver = !Rails.env.test?
    end
    RSpec.describe UsersResponder do
    subject(:responder) { described_class.new }
    describe "#call" do
    let(:payload) { { "user_id" => 1 } }
    let(:data) do
    [[payload.to_json, { topic: "users" }]]
    end
    it "publishes stuff" do
    responder.call(payload)
    expect(responder.messages_buffer["users"]).to eq data
    end
    end
    end

    View Slide

  18. karafka-sidekiq-backend
    - A separate gem
    - Useful when you need to maximize the
    throughput on the consumers’ side
    - High price to pay: messages no longer
    ordered :(

    View Slide

  19. karafka-sidekiq-backend
    #1
    class KarafkaApp < Karafka::App
    setup do |config|
    config.backend = :sidekiq
    end
    end
    class ApplicationWorker < Karafka::BaseWorker
    end
    #2
    KarafkaApp.routes.draw do
    consumer_group :example_consumer_group do
    topic :users do
    backend :sidekiq
    consumer UserConsumer
    worker KarafkaWorkers::UserWorker
    interchanger Interchangers:UserInterchanger # optional
    end
    end
    end

    View Slide

  20. Manual offset management
    - Make sure that you know what you are doing
    and why, in most cases you don’t need that
    feature
    - Karafka handles offset management out-of-
    box - it commits offsets after processing
    individual message or a batch (depending on
    “batch_fetching” setting)
    -

    View Slide

  21. Manual offset management
    class App < Karafka::App
    setup do |config|
    config.kafka.automatically_mark_as_consumed = false
    end
    consumer_groups.draw do
    consumer_group :users do
    automatically_mark_as_consumed false
    end
    consumer_group :accounts do
    automatically_mark_as_consumed true
    end
    end
    end

    View Slide

  22. Manual offset management
    class UsersConsumer < ApplicationConsumer
    def consume
    do_something_with_the_batch(params_batch)
    mark_as_consumed!(params_batch.last) # blocking/sync operation
    # or
    mark_as_consumed(params_batch.last) # non-blocking/async
    operation
    end
    end

    View Slide

  23. What if there is an exception on the consumer’s side?
    - If the consumer blows up with an error, it will
    stop for a while (configurable) and retry later
    - The messages will never be skipped
    - That means that the consumer will get stuck
    and not process any other messages until the
    issue is addressed

    View Slide

  24. What if there is an exception on the consumer’s side?
    - By default, the worker will retry consuming
    every 10 seconds (configurable via ”
    pause_timeout” config param)
    - You can also enable exponential backoff
    (”pause_exponential_backoff” - disabled by
    default). Might be a good idea also to set
    ”pause_max_timeout” to not let the retry delay
    go out of control

    View Slide

  25. Integration with Sentry
    module KarafkaSentryListener
    PROBLEM_POSTFIXES = %w[
    _error
    _retry
    ].freeze
    class << self
    def method_missing(method_name, *args, &block)
    return super unless eligible?(method_name)
    Raven.capture_exception(args.last[:error])
    end
    def respond_to_missing?(method_name, include_private = false)
    eligible?(method_name) || super
    end
    private
    def eligible?(method_name)
    PROBLEM_POSTFIXES.any? do |postfix|
    method_name.to_s.end_with?(postfix)
    end
    end
    end
    end

    View Slide

  26. Integration with Sentry
    # in an initializer
    Karafka.monitor.subscribe(KarafkaSentryListener)

    View Slide

  27. Integration with NewRelic
    class KarafkaNewRelicListener
    class << self
    def method_missing(method_name, *args, &block)
    return super unless method_name.to_s.end_with?("_error")
    NewRelic::Agent.notice_error(args.last[:error])
    end
    def respond_to_missing?(method_name, include_private = false)
    method_name.to_s.end_with?("_error") || super
    end
    end
    end

    View Slide

  28. Integration with NewRelic
    # in an initializer
    Rails.application.config.to_prepare do
    Karafka::BaseConsumer.class_eval do
    def consume(*)
    end
    end
    Karafka::BaseConsumer.descendants.each do |consumer_class|
    consumer_class.instance_eval do
    include ::NewRelic::Agent::Instrumentation::ControllerInstrumentation
    add_transaction_tracer :consume, category: :task
    end
    consumer_class.class_eval do
    include ::NewRelic::Agent::Instrumentation::ControllerInstrumentation
    add_transaction_tracer :consume, category: :task
    end
    end
    end
    Karafka.monitor.subscribe(KarafkaNewRelicListener)

    View Slide

  29. Thanks!

    View Slide