10 ways to deploy Apache Kafka® and have fun along the way

1 Deploying Apache Kafka, a journey recap Pere Urbon-Bayes @purbon
Technology Architect Conﬂuent

2 Topics for today 1. Apache Kafka, the different components
2. Deployment situations 1. Single Data Center 2. Multi Data Center 1. Active – Passive 2. Active - Active 3. Stretched Cluster 1. 3 DC 2. 2 DC 3. 2.5 DC 4. The cloud, or someone else machines

3 Apache Kafka internals report

4 What is Kafka?

5 Apache Kafka, a distributed system

7 Understanding the process of a Request

8 Deploying Apache Kafka (and Apache Zookeeper)

9 Ways to deploy a fresh and shiny Kafka Cluster
• Manually, you probably are not considering this. ◦ Available as rpm, deb, zip and tar.gz ◦ https://docs.confluent.io/current/installation/index.html • Infrastructure as Code: ◦ Ansible: https://github.com/confluentinc/cp-ansible ◦ Puppet: https://forge.puppet.com/modules?utf-8=%E2%9C%93&page_size=25&sort=rank&q=confluent ◦ Chef: https://supermarket.chef.io/cookbooks?utf8=%E2%9C%93&q=confluent&platforms%5B%5D= (bit outdated) ◦ Terraform: ▪ https://github.com/Mongey/terraform-provider-kafka ▪ https://github.com/astubbs/cp-cluster-multi-region-terraform • Available in DockerHub as well.

10 10 1 Data Center

11 1 Data Center • Full deployment under the same
location (data center). • Good latency numbers, all the relevant actors are nearby. • In case of data center problems, is all or nothing. ◦ But probably all your other apps are having problems as well.

12 12 Single Cluster Deployment • ﬂoor(N/2) Zookeeper ◦ 3
nodes ◦ 5 nodes ◦ Do I need more? ….. • N number of brokers • The challenge of co-location • Rack awareness

13 1 Data center Thinking where to deploy your Apache
Kafka Cluster? ◦ Zookeeper is well known to be sensitive to latency. Please try to avoid deploying Apache Kafka with under any type of SAN. ◦ Long pauses, for example due to GC, might make zookeeper to think a broker is dead (while only being paused). ▪ Having VMware+VMotion, you should test the latency impacts when VMotion is active. ◦ Apache Kafka does not need lots of Java Heap <disclaimer when using SSL>, it uses 0-copy. ◦ If possible use SSD for your Zookeeper deployment

14 1 Data center Thinking where to deploy your Apache
Kafka Cluster? ◦ RAID or JOBD ? ◦ Using virtualization ▪ Where are your VM’s hosted ▪ Having noisy neighbors ◦ Using SAN?

15 1 Data center Have you add your monitoring?

16 1 Data center (security) • Using Security with TLS
◦ Not an option anymore to do 0-copy ◦ Increased java heap requirements, min 4Gb. ◦ Impact in throughput performance (around 30%), not anymore with Java11 • Handling certiﬁcates (CRL or OCSP), need to use the JVM for this • Managing the JVM stores and others [KIP-226]

17 1 Data center (but generally for everyone) Understanding the
moving parts. ◦ Topics has partitions, with ▪ Leaders: Each partitions has one leader, and many followers ▪ replication.factor: How many copies of each partition are going to be created ▪ min.insync.replicas: Minimum number of replicas that are required to be in sync ◦ ACK’s: Number of replicas that need to acknowledge receiving the message ◦ Batching: The producer will batch a number of messages together to increase performance ◦ Retries: If something goes wrong, messages will be retried until a certain limit.

18 18 Multi data center

19 Active - Passive • The active part, the main
cluster were your apps goes to. • There is an standby, or follower, cluster were data is being replicated. • This conﬁguration is good for: ◦ Disaster recovery (natural failover) ◦ You can leverage the follower cluster for oﬄine workloads • On the other side, this ads challenges for: ◦ Maintenance and monitoring burden ◦ HW cost

21 Active - Active • There are two, or more,
clusters where “online” applications are writing and reading data. • Both clusters are now more utilized. • Replication needs to be set both sides now (add namespaces) • In case of disaster recovery this mode ads extra failover

23 23 Stretching your cluster

24 Stretching your cluster Very important: Do you know what
you’re doing? Think again, lots of things can go wrong with this architecture.

25 Stretching your cluster • An stretch cluster can exist
with different setups: ◦ Over 3 datacenters. ▪ With Brokers and Zookeepers on each location. ▪ With Brokers in 2 locations and Zookeepers in 3 (two and a half locations). ◦ Over 2 datacenters. What happens now with consensus? • You might want to stretch your cluster make internal Apache Kafka replication work for you.

26 3 Data Centers • The most natural way to
stretch a cluster is over three datacenters • Brokers (N) and Zookeepers (5) are distributed in all the locations • Remember latency will be critical to the success of this deployment • Replication factor and ACK’s get very important to ensure the cluster health

28 3 Data Centers • Good things of this architecture
◦ Easy to setup, is a single cluster over different locations. ◦ Transparent out of the box failover (included in the Apache Kafka protocol / clients) ◦ Can survive without downtime 1 full DC failure. • But there are challenges as well ◦ Latency could become a problem (like in many other situations) ◦ Clients are not smart (location based), they read from their partition leaders

29 You can add back cluster in the 3 DC
for recovery / back purposes.

30 2 Data Centers • Most common than having three
Data Centers, not many orgs have 3 DC’s. • This DC are usually nearby, including a good connection link, so less latency. • But many questions arise: ◦ How are we going to setup coordination / quorum with Zookeeper? ◦ How many brokers should I have? ◦ What happen if I loss one data center? ◦ Is there anything the clients should be enforcing?

32 what happen if: • Zookeeper 3 in DC3 is
unavailable (for example due to performance) How do we maintain? • ISR list • quorum / leader election? • … We might try to bring an ensemble back, but data loss, duplication and divergence are easy to happen.

34 Two and a half Data Centers What about if
we could add a new DC under your desk?

35 Two and a half Data Centers • The most
common scenario are organisations with only two data centers available, but what if: ◦ You could use a third one, might be with lover resilience? ◦ Or the cloud? • Are we going to have a more resilient deployment?

37 Remember • In a 2 DC deployment: ◦ Use
zookeeper hierarchical quorum to achieve consistency. ◦ You will have to choose between availability and consistency • Use ack=all and min.isr > 50% to assure data is replicated over the different nodes in the data center. • Remember: rack awareness is only enforced during topic creation. • If you can, avoid stretching your cluster, but if you should use 3 data centers.

38 38 Because not many have on prem, what about
the cloud?

39 CLOUD THERE IS NOT, ANOTHER PERSON’S COMPUTER’S IT IS

40 Deploying in the cloud • A region is a
collection of nearby data centers, cross region clusters are discouraged. • Stretch your cluster over 3 availability zones in your region, making your cluster more resilient. • Think about your storage, instance, volumes (EBS) or temporary storage? ◦ Faster recovery time? ◦ Do I need replication.factor anymore if I have shared volumes (EBS)? ◦ Is temporary storage of any beneﬁt? • Every cloud have network usage limitations, remember the replica.fetch.min.bytes can be of help here.

41 Deploying in the cloud • Better to benchmark to
be sure, keep an eye on the IO performance to be as expected. • If latency becomes a problem, you can try increasing the zookeeper timeouts, do that with responsibility. • Immutable infrastructure for your installations and upgrades. • Monitoring is more important than ever: ◦ Keep an eye on brokers who has a sudden increase in latency for produce or fetch requests.

42 Deploying in the cloud • Doing autoscaling? with or
without Kubernetes? ◦ how do you handle volume reassignment? ◦ What about partitions reassignment? ◦ When do you trigger it?

43 Thanks! Questions? Pere Urbon-Bayes @purbon Technology Architect Conﬂuent

10 ways to deploy Apache Kafka® and have fun al...

10 ways to deploy Apache Kafka® and have fun along the way

More Decks by Pere Urbón

Other Decks in Technology

Featured

Transcript