Kafka & Karafka - Monitoring
View Slide
Monitoring - Warning- No monitoring? Forget about using Kafka onproduction
Monitoring - Kafka cluster itself vs. application-related metrics- Both are critical- Karafka uses ruby-kafka (versions 1.x) underthe hood, which has a great integration withDatadog/Statsd
Kafka Broker metrics- Under-replicated partitions
Kafka Broker metrics- Active Controller Count
Kafka Broker metrics- messages_in_per_sec
Kafka Broker metrics- bytes_in_per_sec
Kafka Broker metrics- bytes_out_per_sec
Kafka Broker metrics- Leader count
Kafka Broker metrics- Offline Partitions Count
Kafka Broker metrics- Replication Max Lag
Kafka Broker metrics- (AWS MSK) data_logs_disk_used
Producer metrics - is stuff working at all?- ruby_kafka.api.calls
Producer metrics - are messages getting published?- ruby_kafka.producer.deliver.messages
Producer metrics - are messages getting published?- ruby_kafka.producer.deliver.attempts
Producer metrics - are messages getting published?- ruby_kafka.producer.deliver.errors
Consumer metrics - does stuff work at all?- ruby_kafka.api.calls
Consumer metrics - does stuff work at all?- ruby_kafka.api.errors
Consumer metrics - are messages getting consumed?- ruby_kafka.consumer.lag{*} by {topic,partition}
Consumer metrics - are messages getting consumed?- ruby_kafka.consumer.messages by{topic,partition}
Consumer metrics - is there anything wrong going on with consumers?- ruby_kafka.consumer.leave_group
Consumer metrics - is there anything wrong going on with consumers?- ruby_kafka.consumer.sync_group
Consumer metrics - is there anything wrong going on with consumers?- ruby_kafka.consumer.join_group
Consumer metrics - is there anything wrong going on with consumers?- ruby_kafka.fetcher.queue_size
Thanks!