My 2 Weeks Experience Trying Kafka with Go

My 2 Weeks Experience Trying Kafka with Go YOGYAKARTA, NOV
28, 2018 Yusuf Syaifudin Qiscus

Today’s (glorious) blather. The Background 01 What Is Kafka? Why
Kafka? Running Kafka Cluster Create Producer and Consumer with Go 02 03 04 05

The Background SECTION ONE

Background Question • Why our API (particularly API that writes
directly into DB) so slow? • How about move all time-consuming task in the worker?

Why Our API So Slow? More likely because we writes
directly into database. In addition, we have all process for sending PN, Webhooks or Caching comment’s object right after we have the request. Even it runs in go routine, but it’s all done in the same machine.

Background Question • We need a fast storage engine that
persist in disk to save our request log. • We need to query them once we get the data.

Using some tools but didn’t work • Been give the
Elasticsearch a try but causing our service down when Elasticsearch unreachable. • Move to Papertrail but we can’t query it in specific fields. • Try to make a service that accept the data via UDP which looks like a Papertrail, but if we insert into RDMS or NoSQL, it still can be easily fail when it hits thousands insert per second.

Why Our API So Slow? Image source: https://cheesecakelabs.com/blog/simple-chat-architecture-mvp/ Suppose we
have the system that handling data like this. *ps: we don’t use Django nor NodeJS, this is just an illustration

Adding Log, Cache and Index Service What?! Looks complicated!

What’s the problem? • Writing directly into database will causing
you need to add up more resource (whether vertical or horizontal scaling) and it is easy to reach a peak point when you have really really big request. We need streaming platform. • Instead writing directly into database, we can temporary save data and later consume it per-batch to save it into persistent storage. • Some tools to queueing messages: RabbitMQ, Kafka

How about move all time-consuming task in the worker? So,
we need to have a message queue which have following criteria: • It must have publish-subscribe mechanism. • It must be scalable. • Writes much be faster than DB (or at least our current DB).

So, let’s give the Apache Kafka a try to handle
this problem Instead writing directly into each service (Redis, DB, Log). We need a streaming tools to stream our data: Kafka.

What Is Apache Kafka?

What Is Apache Kafka? • Apache Kafka is an open-source
stream processing software platform which started out at Linkedin. • It's written in Scala and Java. • Kafka is based on commit log, which similar with common RDBMS uses.

Commit Log • It’s just like ordinary queue, which each
message is served in FIFO stack. • No need further processing which caused overhead cost like DB’s do, such as: locking table, checking foreign key, building index, etc.

Commit Log How Kafka stores a log, image source: https://www.confluent.io/blog/okay-store-data-apache-kafka/

But, not just one Source: https://kafka.apache.org/documentation/#design For each topic, the
Kafka cluster maintains a partitioned log that looks like this.

Then How We Consume The Messages? What Is Kafka?

Consuming the Messages from Kafka • In Kafka you can
consume data from specific partition of a topic, or you can consume it from all partition. • Interestingly, you can subscribe the data using several clients/workers and make each of it retrieve different data from different partition using consumer group. So, what is a consumer group?

Consumer Group • Consumer group is like a label of
a group of consumer. • For each consumer under the same label, you will consume different messages.

Let’s Imagine It • It’s like when you are at
school, your teacher wants you to make a group of 3 persons. • Then, your teacher will give each group a label of name, for example group 1 given a name "Tiger", while the other one "Apple", and so on. • From now on, you and your other 2 friends are recognized as one entity rather than 3 entity. • For example, first group is given assignment to sweep the classroom. That 3 pupils may split their task with the first one sweep the left side, one another in the middle, and the latter in the right side.

Just like Japanese pupils do... Students wipe the floor. Source:
http://lucky-japan.blogspot.com/2014/10/japanese-students-clean-classrooms-on.html

Consumer Group In Picture Kafka consumer group. Source: https://kafka.apache.org/documentation/#design For
each consumer group, it will consume different messages from different partitions.

Why Kafka? So,

{ It can publish and subscribe the message Publish and
Subscribe. One of the Kafka use case is for messaging. Copied from it’s website: “Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc).”

{ It’s Scalable It Scalable Kafka can be scaled by
adding new broker in the same cluster. We can also adding more partition in the topic, so it can be consumed by more worker. Don’t forget to add different ID foreach broker, or it will be identical. (http://tech.gc.com/scaling-with-kafka/) You can also read: • https://engineering.linkedin.com/kafka/running-kaf ka-scale

{ It Fast It fast Unlike most modern database which
using tree structure, Kafka use queue data structure. This makes all operations take O(1) since it just append data into partition. You can also read: • https://www.quora.com/Why-does-Kafka-scale- better-than-other-messaging-systems-like-Rab bitMQ • http://searene.me/2017/07/09/Why-is-Kafka-so- fast/ • https://medium.freecodecamp.org/what-makes- apache-kafka-so-fast-a8d4f94ab145

Running Kafka Cluster Kafka In Action

Using Docker Compose version: '2' services: zk1: image: confluentinc/cp-zookeeper:5.0.0 container_name:
zk1 ports: - "22181:22181" environment: ZOOKEEPER_SERVER_ID: 1 ZOOKEEPER_CLIENT_PORT: 22181 ZOOKEEPER_TICK_TIME: 2000 ZOOKEEPER_INIT_LIMIT: 5 ZOOKEEPER_SYNC_LIMIT: 2 ZOOKEEPER_SERVERS: zk1:22888:23888 kafka1: image: confluentinc/cp-kafka:5.0.0 container_name: kafka1 ports: - "19092:19092" depends_on: - zk1 environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: ${MY_IP}:2218 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://${MY_IP}:19092 Define docker-compose.yml like this Check your local IP using command: ifconfig -a Then run with: MY_IP=your-ip docker-compose up

Create the topic docker run --net=host --rm confluentinc/cp-kafka:5.0.0 \ kafka-topics
--create --topic message \ --partitions 4 \ --replication-factor 2 \ --if-not-exists \ --zookeeper localhost:22181 Create topic “message” with 4 partitions and 2 replication factors

Create Producer and Consumer with Go Apache Kafka + Go

Create the producer • It's just like common API handling
which accept the request, validate it and then save the data into database, but now it is in Kafka. • Later, rather than finish the data life-cycle in Kafka, we need to create a worker to queue it to write into database.

Create the consumer as a worker • Writing the consumer
is much simpler. The main idea is just retrieve data from Kafka then process it. For each consumer group, we can do different works.

Code Don’t tell me stories, show me the

You can also read this tutorial in https://medium.com/@yusufs/getting-started-with- kafka-in-golang-14ccab5fa26 THANK
YOU!

My 2 Weeks Experience Trying Kafka with Go

My 2 Weeks Experience Trying Kafka with Go

More Decks by Yusuf Syaifudin

Other Decks in Technology

Featured

Transcript