Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My 2 Weeks Experience Trying Kafka with Go

My 2 Weeks Experience Trying Kafka with Go

This is my presentation when talking about Apache Kafka in Qiscus Tech Talk #113 at Block71 Yogyakarta. You can expect a working sample code in this presentation and also read the blog version here https://medium.com/@yusufs/getting-started-with-kafka-in-golang-14ccab5fa26

Avatar for Yusuf Syaifudin

Yusuf Syaifudin

November 28, 2018
Tweet

More Decks by Yusuf Syaifudin

Other Decks in Technology

Transcript

  1. Today’s (glorious) blather. The Background 01 What Is Kafka? Why

    Kafka? Running Kafka Cluster Create Producer and Consumer with Go 02 03 04 05
  2. Background Question • Why our API (particularly API that writes

    directly into DB) so slow? • How about move all time-consuming task in the worker?
  3. Why Our API So Slow? More likely because we writes

    directly into database. In addition, we have all process for sending PN, Webhooks or Caching comment’s object right after we have the request. Even it runs in go routine, but it’s all done in the same machine.
  4. Background Question • We need a fast storage engine that

    persist in disk to save our request log. • We need to query them once we get the data.
  5. Using some tools but didn’t work • Been give the

    Elasticsearch a try but causing our service down when Elasticsearch unreachable. • Move to Papertrail but we can’t query it in specific fields. • Try to make a service that accept the data via UDP which looks like a Papertrail, but if we insert into RDMS or NoSQL, it still can be easily fail when it hits thousands insert per second.
  6. Why Our API So Slow? Image source: https://cheesecakelabs.com/blog/simple-chat-architecture-mvp/ Suppose we

    have the system that handling data like this. *ps: we don’t use Django nor NodeJS, this is just an illustration
  7. What’s the problem? • Writing directly into database will causing

    you need to add up more resource (whether vertical or horizontal scaling) and it is easy to reach a peak point when you have really really big request. We need streaming platform. • Instead writing directly into database, we can temporary save data and later consume it per-batch to save it into persistent storage. • Some tools to queueing messages: RabbitMQ, Kafka
  8. How about move all time-consuming task in the worker? So,

    we need to have a message queue which have following criteria: • It must have publish-subscribe mechanism. • It must be scalable. • Writes much be faster than DB (or at least our current DB).
  9. So, let’s give the Apache Kafka a try to handle

    this problem Instead writing directly into each service (Redis, DB, Log). We need a streaming tools to stream our data: Kafka.
  10. What Is Apache Kafka? • Apache Kafka is an open-source

    stream processing software platform which started out at Linkedin. • It's written in Scala and Java. • Kafka is based on commit log, which similar with common RDBMS uses.
  11. Commit Log • It’s just like ordinary queue, which each

    message is served in FIFO stack. • No need further processing which caused overhead cost like DB’s do, such as: locking table, checking foreign key, building index, etc.
  12. But, not just one Source: https://kafka.apache.org/documentation/#design For each topic, the

    Kafka cluster maintains a partitioned log that looks like this.
  13. Consuming the Messages from Kafka • In Kafka you can

    consume data from specific partition of a topic, or you can consume it from all partition. • Interestingly, you can subscribe the data using several clients/workers and make each of it retrieve different data from different partition using consumer group. So, what is a consumer group?
  14. Consumer Group • Consumer group is like a label of

    a group of consumer. • For each consumer under the same label, you will consume different messages.
  15. Let’s Imagine It • It’s like when you are at

    school, your teacher wants you to make a group of 3 persons. • Then, your teacher will give each group a label of name, for example group 1 given a name "Tiger", while the other one "Apple", and so on. • From now on, you and your other 2 friends are recognized as one entity rather than 3 entity. • For example, first group is given assignment to sweep the classroom. That 3 pupils may split their task with the first one sweep the left side, one another in the middle, and the latter in the right side.
  16. Just like Japanese pupils do... Students wipe the floor. Source:

    http://lucky-japan.blogspot.com/2014/10/japanese-students-clean-classrooms-on.html
  17. Consumer Group In Picture Kafka consumer group. Source: https://kafka.apache.org/documentation/#design For

    each consumer group, it will consume different messages from different partitions.
  18. { It can publish and subscribe the message Publish and

    Subscribe. One of the Kafka use case is for messaging. Copied from it’s website: “Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc).”
  19. { It’s Scalable It Scalable Kafka can be scaled by

    adding new broker in the same cluster. We can also adding more partition in the topic, so it can be consumed by more worker. Don’t forget to add different ID foreach broker, or it will be identical. (http://tech.gc.com/scaling-with-kafka/) You can also read: • https://engineering.linkedin.com/kafka/running-kaf ka-scale
  20. { It Fast It fast Unlike most modern database which

    using tree structure, Kafka use queue data structure. This makes all operations take O(1) since it just append data into partition. You can also read: • https://www.quora.com/Why-does-Kafka-scale- better-than-other-messaging-systems-like-Rab bitMQ • http://searene.me/2017/07/09/Why-is-Kafka-so- fast/ • https://medium.freecodecamp.org/what-makes- apache-kafka-so-fast-a8d4f94ab145
  21. Using Docker Compose version: '2' services: zk1: image: confluentinc/cp-zookeeper:5.0.0 container_name:

    zk1 ports: - "22181:22181" environment: ZOOKEEPER_SERVER_ID: 1 ZOOKEEPER_CLIENT_PORT: 22181 ZOOKEEPER_TICK_TIME: 2000 ZOOKEEPER_INIT_LIMIT: 5 ZOOKEEPER_SYNC_LIMIT: 2 ZOOKEEPER_SERVERS: zk1:22888:23888 kafka1: image: confluentinc/cp-kafka:5.0.0 container_name: kafka1 ports: - "19092:19092" depends_on: - zk1 environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: ${MY_IP}:2218 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://${MY_IP}:19092 Define docker-compose.yml like this Check your local IP using command: ifconfig -a Then run with: MY_IP=your-ip docker-compose up
  22. Create the topic docker run --net=host --rm confluentinc/cp-kafka:5.0.0 \ kafka-topics

    --create --topic message \ --partitions 4 \ --replication-factor 2 \ --if-not-exists \ --zookeeper localhost:22181 Create topic “message” with 4 partitions and 2 replication factors
  23. Create the producer • It's just like common API handling

    which accept the request, validate it and then save the data into database, but now it is in Kafka. • Later, rather than finish the data life-cycle in Kafka, we need to create a worker to queue it to write into database.
  24. Create the consumer as a worker • Writing the consumer

    is much simpler. The main idea is just retrieve data from Kafka then process it. For each consumer group, we can do different works.