was initially originated by LinkedIn and later became an open sourced Apache in 2011. Kafka is messaging queuing system and it is written in Java and Scala. Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. Kafka has four core API's • The Producer API allows an application to publish a stream of records to one or more Kafka topics. • The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them. • The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams. • The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.
Kafka • Topic is name of category or feed where records has been published. Topic are always multi subscriber as it can have zero or more consumers that subscribe to the data written to it • Producers publish data to topics of their choice. It can publish data to one or more Kafka topics. • Consumers consume data from Topics. Consumers subscribes to one or more topics and consume published messages by pulling data from the brokers. • Partition: - Topics may have many partitions, so it can handle an arbitrary amount of data. • Partition offset:- Each partitioned message has unique id and it is known as offset. • Brokers are simple system responsible for maintaining the published data. Each broker may have zero or more partitions per topic. • Kafka Cluster: - Kafka's server has one or more brokers are called Kafka Cluster.
Cases Below is some use cases where Apache Kafka can be consider. 3.1 Messaging In comparison to other messaging system, Apache Kafka has better throughput and performance, partitioning, replication, and fault- tolerance which makes it a good solution for large scale message processing applications. 3.2 Website Activity Tracking Website activity like number of view, number of searches or any other actions that users may perform is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting. 3.3 Metrics Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data. 3.4 Log Aggregation Kafka can be used across an organization to collect logs from multiple services and make them available in standard format to multiple consumers. 3.5 Stream Processing Popular frameworks such as Storm and Spark Streaming read data from a topic, processes it, and write processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing.
Windows Server Now you will learn how to setup Zookeeper on Windows Server. Make sure JRE8 has been installed and JAVA_HOME path is setup in environment variable.
Zookeeper Zookeeper also plays vital role for serving so many other purposes such as leader detection, configuration management, synchronization, detecting when a new node join or leaves the cluster etc. • Download the ZooKeeper from http://zookeeper.apache.org/releases.html and extract it (e.g. zookeeper-3.4.10). • Go to your Zookeeper directory (e.g. C:\zookeeper-3.4.10\conf). • Rename file zoo_sample.cfg to zoo.cfg. • Open zoo.cfg file in text editor like Notepad or Notepad++. • Search for dataDir=/tmp/zookeeper and update path to dataDir=\zookeeper-3.4.10\data. • Add two environment variable • a. Add System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.10 • b. Edit System Variable named Path add ;%ZOOKEEPER_HOME%\bin; • By default Zookeeper run on port 2181 but you can change the port by editing zoo.cfg.
On Windows Server Now you will learn how to setup Apache Kafka on Windows Server. 5.1 Download & Install Apache Kafka • Download the Apache Kafka from http://kafka.apache.org/downloads.html and extract it (e.g. kafka_2.11-0.9.0.0). • Go to your Kafka config directory (e.g. C:\kafka_2.11-0.9.0.0\config). • Open file server.properties in text editor like Notepad or Notepad++. • Search for log.dirs=/tmp/kafka-logs and update path to log.dirs=C:\kafka_2.11-0.9.0.0\kafka- logs.
Apache Kafka Server Now we will create topic with replication factor 1 as only one kafka server is running. • Open command prompt and make sure you are at path C:\kafka_2.11-0.9.0.0\bin\windows • Run below command to create topic kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 -- partitions 1 --topic muleesb
Connector Ingest streaming data from Kafka and publish it to Kafka with this connector. By default Kafka connector is not part of Mule palette and you can install the Kafka connector by connecting to Anypoint Exchange from Anypoint Studio. You just need to accept the license agreement and at the end of installation it will ask you to restart the Anypoint studio. Streamline business processes and move data between Kafka and Enterprise applications and services with the Anypoint Connector for Kafka. Kafka Connector enables out-of-the-box connectivity with Kafka, allowing users to ingest real- time data from Kafka and publish it to Kafka.
With Mule ESB as Producer We will implement flow that will publish message to Apache Kafka server. • Place the http connector at message source and configure it. • Drag and Drop Apache Kafka connector and configure it by clicking on add button. Configure Bootstrap Servers, Producer Properties File and Consumer Properties File. Press OK.
to Producer, Topic name and Key (it is some unique key that needs to publish with message). • Add consumer .properties and producer.properties file to mule application build path (src/main/resources). Both properties files can be found at location C:\kafka_2.11- 0.9.0.0\config.
With Mule ESB as Consumer We will implement flow that will consume message from Apache Kafka server. • Place the Apache Kafka connector at message source and configure it by clicking on add button. Configure Bootstrap Servers, Producer Properties File and Consumer Properties File as shown above. Press OK. • Configure Operation to Consumer, Topic name and Partitions (i.e. it is number of partition you have given during creating topic).
use Postman to test the application. Send the POST request to producer flow and it will publish message to Apache Kafka. Once message is publish, it will be consumed by consumer flow and save message to specified directory. For more details on testing, please watch demonstration video with this slide.
is very powerful distributed, scalable and durable message queing system. Mule ESB provides the Apache Kafka connector that can publish message to Kafka server and consume message from Kafka server (i.e. can act as producer as well as consumer.