Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I Can’t Believe It’s Not A Queue: Using Kafka with Rails - RailsConf 2016

B87c43d4be875c9b41cd436f5c364f75?s=47 hone
May 04, 2016

I Can’t Believe It’s Not A Queue: Using Kafka with Rails - RailsConf 2016

Video: https://www.youtube.com/watch?v=s3VQIGD5iGo

Your existing message system is great, until it gets overloaded. Then what? That's when you should try Kafka.

Kafka's designed to be resilient. It takes the stress out of moving from a Rails monolith into a scalable system of microservices. Since you can capture every event that happens in your app, it's great for logging. You can even use Kafka's distributed, ordered log to simulate production load in your staging environment.

Come and learn about Kafka, where it fits in your Rails app, and how to make it do the things that message queues simply can't.

B87c43d4be875c9b41cd436f5c364f75?s=128

hone

May 04, 2016
Tweet

More Decks by hone

Other Decks in Programming

Transcript

  1. None
  2. Happy Star Wars Day

  3. Terence Lee @hone02

  4. None
  5. #rubykaraoke

  6. #rubykaroke, Thurs 7pm @ Offkey

  7. None
  8. I Can't Believe It's Not a Queue Using Kafka with

    Rails
  9. Agenda • What is Kafka? • Kafka + Ruby •

    Use Case: Metrics • Other Patterns
  10. What is Kafka?

  11. Kafka is a distributed, partitioned, replicated commit log service. It

    provides the functionality of a messaging system, but with a unique design.
  12. Distributed Publish Subscribe Messaging

  13. Fast Scalable Durable

  14. "hundreds of thousands to millions of messages a second on

    a small cluster" Tom Crayford Heroku Kafka
  15. None
  16. Producers & Consumers

  17. Messages Byte Arrays -> String JSON Any format

  18. Feed of Messages in Topics

  19. Each Topic Partition is a log of ordered immutable messages,

    append-only
  20. Offets

  21. Keyed Messages will be consumed by the same consumer

  22. Consumer Groups allow scaling per topic and ensure each message

    gets at least once
  23. Consumer Groups

  24. Kafka + Ruby

  25. jruby-kafka

  26. kafka-ruby

  27. Simple Producer require "kafka" kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"]) producer

    = kafka.producer
  28. Send a message require "kafka" kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"])

    producer = kafka.producer producer.produce("hello1", topic: "test-messages")
  29. Keyed Message require "kafka" kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"]) producer

    = kafka.producer producer.produce("hello1", topic: "test-messages") producer.produce("hello2", key: "x", topic: "test-messages")
  30. Message to a Partition require "kafka" kafka = Kafka.new(seed_brokers: ["kafka1:9092",

    "kafka2:9092"]) producer = kafka.producer producer.produce("hello1", topic: "test-messages") producer.produce("hello2", key: "x", topic: "test-messages") producer.produce("hello3", topic: "test-messages", partition: 1)
  31. Deliver Messages require "kafka" kafka = Kafka.new(seed_brokers: ["kafka1:9092", "kafka2:9092"]) producer

    = kafka.producer producer.produce("hello1", topic: "test-messages") producer.produce("hello2", key: "x", topic: "test-messages") producer.produce("hello3", topic: "test-messages", partition: 1) producer.deliver_messages
  32. Async Producer # `async_producer` will create a new asynchronous producer.

    producer = kafka.async_producer( # Trigger a delivery once 100 messages have been buffered. delivery_threshold: 100, # Trigger a delivery every second. delivery_interval: 1, )
  33. Serialization event = { "name" => "pageview", "url" => "https://example.com/posts/123",

    # ... } data = JSON.dump(event) producer.produce(data, topic: "events")
  34. Rails Producer # config/initializers/kafka_producer.rb require "kafka" # Configure the Kafka

    client with the broker hosts and the Rails # logger. $kafka = Kafka.new( seed_brokers: ["kafka1:9092", "kafka2:9092"], logger: Rails.logger, )
  35. Rails Producer # ... # Set up an asynchronous producer

    that delivers its buffered messages # every ten seconds: $kafka_producer = $kafka.async_producer( delivery_interval: 10, ) # Make sure to shut down the producer when exiting. at_exit { $kafka_producer.shutdown }
  36. Rails Producer class OrdersController def create @order = Order.create!(params[:order]) event

    = { order_id: @order.id, amount: @order.amount, timestamp: Time.now, } $kafka_producer.produce(event.to_json, topic: "order_events") end end
  37. Consumer API Experimental

  38. Consumer Groups consumer = kafka.consumer(group_id: "my-consumer") consumer.subscribe("greetings") consumer.each_message do |message|

    puts message.topic, message.partition puts message.offset, message.key, message.value end
  39. SSL Kafka.new( seed_brokers: ["kafka1:9092", "kafka2:9092"], ssl_client_cert: ENV['KAFKA_CLIENT_CERT'], ssl_client_cert_key: ENV['KAFKA_CLIENT_CERT_KEY'], ssl_ca_cert:

    ENV['KAFKA_TRUSTED_CERT'] )
  40. Use Case: Metrics

  41. Build metrics based on web traffic

  42. Architecture

  43. Architecture

  44. Heroku Router Logs $ heroku logs -a issuetriage -p router

    2016-05-04T04:57:12.222253+00:00 heroku[router]: at=info method=GET path="/haiwen/seafile" host=issuetriage.herokuapp.com request_id=cf59a503-3159-4d7c-8287-3ba52d7c44df fwd=" 144.76.27.118" dyno=web.2 connect=0ms service=166ms status=200 bytes=36360
  45. Heroku Router Logs $ heroku logs -a issuetriage -p router

    2016-05-04T04:57:12.222253+00:00 heroku[router]: at=info method=GET path="/haiwen/seafile" host=issuetriage.herokuapp.com request_id=cf59a503-3159-4d7c-8287-3ba52d7c44df fwd=" 144.76.27.118" dyno=web.2 connect=0ms service=166ms status=200 bytes=36360
  46. Heroku Router Logs $ heroku logs -a issuetriage -p router

    2016-05-04T04:57:12.222253+00:00 heroku[router]: at=info method=GET path="/haiwen/seafile" host=issuetriage.herokuapp.com request_id=cf59a503-3159-4d7c-8287-3ba52d7c44df fwd=" 144.76.27.118" dyno=web.2 connect=0ms service=166ms status=200 bytes=36360
  47. Heroku Router Logs $ heroku logs -a issuetriage -p router

    2016-05-04T04:57:12.222253+00:00 heroku[router]: at=info method=GET path="/haiwen/seafile" host=issuetriage.herokuapp.com request_id=cf59a503-3159-4d7c-8287-3ba52d7c44df fwd=" 144.76.27.118" dyno=web.2 connect=0ms service=166ms status=200 bytes=36360
  48. Heroku Router Logs $ heroku logs -a issuetriage -p router

    2016-05-04T04:57:12.222253+00:00 heroku[router]: at=info method=GET path="/haiwen/seafile" host=issuetriage.herokuapp.com request_id=cf59a503-3159-4d7c-8287-3ba52d7c44df fwd=" 144.76.27.118" dyno=web.2 connect=0ms service=166ms status=200 bytes=36360
  49. Log Drain over HTTPS $ heroku drains:add \ https://user:pass@logdrain.herokuapp.com/logs \

    -a issuetriage
  50. POST Request Body 83 <40>1 2012-11-30T06:45:29+00:00 host app web.3 -

    State changed from starting to up 119 <40>1 2012-11-30T06:45:26+00:00 host app web.3 - Starting process with command `bundle exec rackup config.ru -p 24405`
  51. POST Request Body 83 <40>1 2012-11-30T06:45:29+00:00 host app web.3 -

    State changed from starting to up 119 <40>1 2012-11-30T06:45:26+00:00 host app web.3 - Starting process with command `bundle exec rackup config.ru -p 24405` does NOT conform to RFC5424. It leaves out STRUCTURED-DATA but does not replace it with a NILVALUE.
  52. Architecture

  53. Log Drain App (Producer) post "/process" do process_messages(body) status 202

    "Accepted" end
  54. Log Drain App (Producer) $kafka_pools = { producer: ConnectionPool.new(size: 5,

    timeout: 5) { Kafka.new (...).async_producer }, }
  55. Log Drain App (Producer) def process_messages(body_text) messages = [] stream

    = Syslog::Stream.new( Syslog::Stream:: OctetCountingFraming.new(StringIO.new(body_text)), parser: Syslog::Parser.new(allow_missing_structured_data: true) ) messages = stream.messages.to_a
  56. Log Drain App (Producer) $kafka_pools[:producer].with do |producer| messages.each do |message|

    producer.produce(message.to_h.to_json, topic: message. procid) if message.procid == "router" end end end
  57. Architecture

  58. Heroku Kafka

  59. Create a Heroku Kafka Cluster $ heroku addons:create heroku-kafka:beta-dev -a

    kafka-demo Creating kafka-reticulated-61055... done, (free) Adding kafka-reticulated-61055 to kafka-demo... done The cluster should be available in 15-45 minutes. Run `heroku kafka:wait` to wait until the cluster is ready. ! WARNING: Kafka is in beta. Beta releases have a higher risk of data loss and downtime. ! Use with caution. Use `heroku addons:docs heroku-kafka` to view documentation.
  60. Connecting to Heroku Kafka Kafka.new( seed_brokers: ENV['KAFKA_URL'] ssl_client_cert: ENV['KAFKA_CLIENT_CERT'], ssl_client_cert_key:

    ENV['KAFKA_CLIENT_CERT_KEY'], ssl_ca_cert: ENV['KAFKA_TRUSTED_CERT'] )
  61. Heroku Kafka Plugin $ heroku plugins:install heroku-kafka

  62. Create a Topic $ heroku kafka:create router

  63. Cluster Info $ heroku kafka:info === KAFKA_URL Name: kafka-reticulated-61055 Created:

    2016-04-19 19:54 UTC Plan: Beta Dev Status: available Version: 0.9.0.0 Topics: 2 topics (see heroku kafka:list) Connections: 0 consumers (0 applications) Messages: 0.37 messages/s Traffic: 28 Bytes/s in / 12.1 KB/s out
  64. Topic Info $ heroku kafka:topic router === KAFKA_URL :: router

    Producers: 0.0 messages/second (0 Bytes/second) total Consumers: 20.8 KB/second total Partitions: 32 partitions Replication Factor: 1 Compaction: Compaction is disabled for router Retention: 24 hours
  65. Tail Topic $ heroku kafka:tail router router 20 2627 378

    {"prival":158,"version":1,"timestamp":"2016- 05-04 08:33:23 +0000","hostname":"ho router 20 2628 371 {"prival":158,"version":1,"timestamp":"2016- 05-04 08:59:00 +0000","hostname":"ho router 20 2629 370 {"prival":158,"version":1,"timestamp":"2016- 05-04 09:22:29 +0000","hostname":"ho
  66. Architecture

  67. Metrics Aggregator (Consumer) consumer = Kafka.new(...).consumer(group_id: "metrics") consumer.subscribe("router", default_offset: :latest)

    redis = Redis.new(url: ENV['REDIS_URL']) metrics = RouteMetrics.new(redis) consumer.each_message do |message| json = JSON.parse(message.value) route = Route.new(json) metrics.insert(route) if route.path end
  68. Metrics Aggregator (Consumer) def insert(route) path = route.path path_digest =

    Digest::SHA256.hexdigest(path) @redis.hset "routes", path, path_digest [:service, :connect].each do |metric| value = route.send(metric).to_i key = "#{path_digest}::#{metric}" @redis.hincrby key, "sum", value @redis.hincrby key, "count", 1 @redis.hset key, "average", @redis.hget(key, "sum").to_i / @redis.hget(key, "count").to_f end @redis.hincrby "#{path_digest}::statuses", route.status, 1 end
  69. Replay (Consumer) consumer = Kafka.new(...).consumer(group_id: "replay") consumer.subscribe("router", default_offset: :latest) client

    = HttpClient.httpClient(...) consumer.each_message do |message| json = JSON.parse(message.value) route = Route.new(json) controller.fork.start do client.get(java.net.URI.new("#{ENV ['REPLAY_HOST']}#{route.path}")).then do |response| puts response.get_body.get_text end end end
  70. Demo Code https://github.com/hone/heroku-replay-ratpack

  71. Tom Crayford

  72. Other Patterns

  73. Messaging • Low Latency • High Throughput • Durability Guarantees

  74. Activity Tracking • Real Time feed of User Activity •

    One Topic per Activity
  75. Heroku Metrics

  76. Heroku API Event Bus

  77. Kafka's unique design can be used to help Rails apps

    become fast, scalable, and durable
  78. Thank You

  79. Joe Kutner @codefinger

  80. Community Office Hours Thurs. 4:10pm (Happy Hour) @ Heroku Booth

    Rails JRuby Heroku