Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka For Rubyists - Advanced Karafka

Kafka For Rubyists - Advanced Karafka

karol.galanciak

February 05, 2021
Tweet

More Decks by karol.galanciak

Other Decks in Programming

Transcript

  1. Karafka hooks/events - Based on dry-monitor - You can subscribe

    to a lot of events to add logging, instrumentation, error handling with external services, extend logic etc.
  2. Karafka hooks/events # 1. Karafka.monitor.subscribe("sync_producer.call.retry") do |event| do_something_with_the_event(event) end #

    2. Karafka.monitor.subscribe(ExampleListener) class ExampleListener def self.on_sync_producer_call_retry(_event) end end
  3. Karafka hooks/events - supported events - params.params.deserialize - params.params.deserialize.error -

    connection.listener.before_fetch_loop - connection.listener.fetch_loop - connection.listener.fetch_loop.error - connection.client.fetch_loop.error - connection.batch_delegator.call - connection.message_delegator.call - fetcher.call.error - backends.inline.process - process.notice_signal - consumers.responders.respond_with - async_producer.call.error - async_producer.call.retry - sync_producer.call.error - sync_producer.call.retry - app.initializing - app.initialized - app.running - app.stopping - app.stopping.error
  4. Waterdrop - Standalone gem used by Karafka for publishing messages

    - You can use Waterdrop directly, but it’s better to use Responders as they provide some extras and are more convenient to use - There is already version 2.0, although current Karafka version (1.4) uses 1.4 version of Waterdrop, so keep that in mind when reading docs
  5. Waterdrop #sync WaterDrop::SyncProducer.call({ user_id: 1 }.to_json, topic: "users", key: "user-1",

    partition_key: "1") #async WaterDrop::AsyncProducer.call({ user_id: 1 }.to_json, topic: "users")
  6. Async Producers - Non-blocking - the operations involving publishing something

    to Kafka can get some extra performance gain - Somewhat protective against Kafka being unavailable - Could lead to loss of messages (some of them might no be published at all)
  7. Serializers - Easily customisable when using Responders - By default

    JSON Serializer is used - Requires equivalent Deserializer on Consumer’s side
  8. Example XML Serializer class KarafkaResponderXmlSerializer def self.call(object) object.to_xml end end

    class ExampleResponder < ApplicationResponder topic :users, serializer: KarafkaResponderXmlSerializer def respond(user) respond_to :users, user end end
  9. Responding from consumers class PagesConsumer < ApplicationConsumer def consume respond_with

    page_id: 1 end end class PagesResponder < ApplicationResponder topic :pages_from_consumer def respond(payload_with_page_id) respond_to :pages_from_consumer, payload_with_page_id end end
  10. Responding from consumers KarafkaApp.consumer_groups.draw do consumer_group :group_for_kafka_example do batch_fetching true

    topic :pages do consumer PagesConsumer responder PagesResponder batch_consuming true end end end
  11. Testing Karafka - consumers - Dedicated gem for testing consumers:

    “karafka-testing” - It brings 2 helpers: - karafka_consumer_for - publish_for_karafka
  12. Testing Karafka - consumers RSpec.configure do |config| config.include Karafka::Testing::RSpec::Helpers end

    RSpec.describe UsersConsumer do subject(:consumer) { karafka_consumer_for(:users) } before do publish_for_karafka({ "user_id" => 1 }.to_json) end it "does some stuff" do # some potential mock consumer.consume # do some assertion here end end
  13. Testing Karafka - responders WaterDrop.setup do |config| config.deliver = !Rails.env.test?

    end RSpec.describe UsersResponder do subject(:responder) { described_class.new } describe "#call" do let(:payload) { { "user_id" => 1 } } let(:data) do [[payload.to_json, { topic: "users" }]] end it "publishes stuff" do responder.call(payload) expect(responder.messages_buffer["users"]).to eq data end end end
  14. karafka-sidekiq-backend - A separate gem - Useful when you need

    to maximize the throughput on the consumers’ side - High price to pay: messages no longer ordered :(
  15. karafka-sidekiq-backend #1 class KarafkaApp < Karafka::App setup do |config| config.backend

    = :sidekiq end end class ApplicationWorker < Karafka::BaseWorker end #2 KarafkaApp.routes.draw do consumer_group :example_consumer_group do topic :users do backend :sidekiq consumer UserConsumer worker KarafkaWorkers::UserWorker interchanger Interchangers:UserInterchanger # optional end end end
  16. Manual offset management - Make sure that you know what

    you are doing and why, in most cases you don’t need that feature - Karafka handles offset management out-of- box - it commits offsets after processing individual message or a batch (depending on “batch_fetching” setting) -
  17. Manual offset management class App < Karafka::App setup do |config|

    config.kafka.automatically_mark_as_consumed = false end consumer_groups.draw do consumer_group :users do automatically_mark_as_consumed false end consumer_group :accounts do automatically_mark_as_consumed true end end end
  18. Manual offset management class UsersConsumer < ApplicationConsumer def consume do_something_with_the_batch(params_batch)

    mark_as_consumed!(params_batch.last) # blocking/sync operation # or mark_as_consumed(params_batch.last) # non-blocking/async operation end end
  19. What if there is an exception on the consumer’s side?

    - If the consumer blows up with an error, it will stop for a while (configurable) and retry later - The messages will never be skipped - That means that the consumer will get stuck and not process any other messages until the issue is addressed
  20. What if there is an exception on the consumer’s side?

    - By default, the worker will retry consuming every 10 seconds (configurable via ” pause_timeout” config param) - You can also enable exponential backoff (”pause_exponential_backoff” - disabled by default). Might be a good idea also to set ”pause_max_timeout” to not let the retry delay go out of control
  21. Integration with Sentry module KarafkaSentryListener PROBLEM_POSTFIXES = %w[ _error _retry

    ].freeze class << self def method_missing(method_name, *args, &block) return super unless eligible?(method_name) Raven.capture_exception(args.last[:error]) end def respond_to_missing?(method_name, include_private = false) eligible?(method_name) || super end private def eligible?(method_name) PROBLEM_POSTFIXES.any? do |postfix| method_name.to_s.end_with?(postfix) end end end end
  22. Integration with NewRelic class KarafkaNewRelicListener class << self def method_missing(method_name,

    *args, &block) return super unless method_name.to_s.end_with?("_error") NewRelic::Agent.notice_error(args.last[:error]) end def respond_to_missing?(method_name, include_private = false) method_name.to_s.end_with?("_error") || super end end end
  23. Integration with NewRelic # in an initializer Rails.application.config.to_prepare do Karafka::BaseConsumer.class_eval

    do def consume(*) end end Karafka::BaseConsumer.descendants.each do |consumer_class| consumer_class.instance_eval do include ::NewRelic::Agent::Instrumentation::ControllerInstrumentation add_transaction_tracer :consume, category: :task end consumer_class.class_eval do include ::NewRelic::Agent::Instrumentation::ControllerInstrumentation add_transaction_tracer :consume, category: :task end end end Karafka.monitor.subscribe(KarafkaNewRelicListener)