Install Apache Kafka KRaft cluster in docker container

Will Barillon
6 min readNov 3, 2023

--

This article aims to provide a practical installation of Kafka cluster with a separation between controllers and brokers. The context of this demonstration is a containerized environment with docker containers.

The starting point of my journey to achieve this is the Apacke Kafka Quickstart.

Get java

It sounds simple, yet it isn’t. When browsing on DockerHub and look for a java image, you think of openjdk right away. But it is a deprecated solution with 5 official image alternatives :

All options are fine in my opinion. I picked the image from ibm-semeru one for arbitrary and irrelevant reasons for most of people (ethic, simplicity of use, open-source).

As for tags, I went to a 22.04 ubuntu distribution (also named Jammy), a runtime environment because I am not supposed to write any java codes for the installation of Apache Kafka and java 17.

Which results in :

ibm-semeru-runtimes:open-17.0.7_7-jre-jammy

The bottleneck of all tutorial for Apache Kafka installation with KRaft

To better understand this bottleneck encountered by nearly every contents published on internet regarding the installation, it is required to understand some parts of the installation process.

The kafka cluster with KRaft is built from several servers acting as either controller or broker. Those servers are all nodes regardless or their role in the cluster.

Each nodes has its own id and the cluster id has its own id generated by random-uuid, as illustrated on the Apacke Kafka Quickstart step 2.

Once the installation is done, a file named meta.properties is generated and located in the directory specified by the property log.dirs of the .properties file of the server.

In the meta.properties file, there are 2 informations that are of interest for a trivial installation :

  • node.id
  • cluster.id

The bottleneck is to declare distinct node id and the same cluster id. This is the reason why there are no content explaining how to setup a kafka cluster with brokers and controllers.

In other word if in meta.properties file broker-side and controller-side, the cluster.id is the same and node.id are differents, the cluster is successfully installed.

Same cluster id, different node id in Dockerfile thanks to ONBUILD

What is ONBUILD ?

As per documentation :

The ONBUILD instruction adds to the image a trigger instruction to be executed at a later time, when the image is used as the base for another build. The trigger will be executed in the context of the downstream build, as if it had been inserted immediately after the FROM instruction in the downstream Dockerfile.

In other words, it is possible to create Dockerfile “parents” that have Dockerfile “children”, taking into account their own context.

If the concept of context in docker is blur, I suggest you to read my article Build a docker image with daemonist’s knowledges.

What is the plan ?

The plan is simple : generate the uuid in the parent Dockerfile, store it in the image so that children Dockerfile can retrieve and use it with the .properties file.

The most convenient way to get your own .properties file is to use those from the Apacke Kafka repo on Github :

Step by step demonstration of onbuild mechanism

Given the following Dockerfile :

# Java Runtime Environment image, because I'm not going to mess around with java code, only use it
FROM ibm-semeru-runtimes:open-17.0.7_7-jre-jammy

WORKDIR app

# Kafka servers are not handling http protocol natively. To properly set a healthcheck up, netcat is required
RUN apt-get update && \
apt-get install -y --no-install-recommends netcat && \
rm -rf /var/lib/apt/lists/* && \
# Download the latest Kafka release (kafka 3.5.1 with scala 2.13.3) and extract it, -O keeps the name of the downloaded file
curl -O https://downloads.apache.org/kafka/3.5.1/kafka_2.13-3.5.1.tgz && \
tar -xzf kafka_2.13-3.5.1.tgz && \
# Remove the compressed file and save around 90 Mb
rm kafka_2.13-3.5.1.tgz && \
# Generate a Cluster UUID and store it in a file
kafka_2.13-3.5.1/bin/kafka-storage.sh random-uuid > /tmp/kafka_cluster_id

# To overwrite an existing file, simply provide the path without specifying filename.
ONBUILD COPY service.properties ./kafka_2.13-3.5.1/config/kraft/service.properties

# The command to format log directories requires a variable. Since variables created cannot be transfered from one layer to another - from a run command to another run command, we need to split our run commands.
ONBUILD RUN KAFKA_CLUSTER_ID=$(cat /tmp/kafka_cluster_id) && \
rm /tmp/kafka_cluster_id && \
# Format Log Directories
kafka_2.13-3.5.1/bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c kafka_2.13-3.5.1/config/kraft/service.properties

# Start the Kafka Server
ONBUILD CMD kafka_2.13-3.5.1/bin/kafka-server-start.sh kafka_2.13-3.5.1/config/kraft/service.properties

Do not use docker compose build to create the image, but the docker build command. Add a tag so that children Dockerfiles can identify the parent Dockerfile.

docker build -t <image name> <path to Dockerfile>

docker build -t server-base ./servers

Once the build is complete, your daemon has now an image named server-base available for FROM clause.

In this use case, the children Dockerfiles are as simple as :

FROM server-base

And yes, it is possible to add new instructions into the child Dockerfile.

Apacke Kafka installation

.properties files

The properties to custom are :

  • node.id : an integer or any identifier for the node ;
  • controller.quorum.voters : the ip adress (or service name) + port of the services in charge of controller role ;
  • listeners (broker only) : the ip adress (or service name) + port of the services in charge of broker role ;
  • advertised.listeners (broker only) : same as for listeners ;

service.properties for broker (truncated !) :

# The role of this server. Setting this puts us in KRaft mode
process.roles=broker

# The node id associated with this instance's roles
node.id=2

# The connect string for the controller quorum
# controller.quorum.voters=1@172.18.0.12:9093
controller.quorum.voters=1@controller:9093

############################# Socket Server Settings #############################

# The address the socket server listens on. If not configured, the host name will be equal to the value of
# java.net.InetAddress.getCanonicalHostName(), with PLAINTEXT listener name, and port 9092.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://broker:9092

# Name of listener used for communication between brokers.
inter.broker.listener.name=PLAINTEXT

# Listener name, hostname and port the broker will advertise to clients.
# If not set, it uses the value for "listeners".
advertised.listeners=PLAINTEXT://broker:9092

service.properties file for controller (truncated !) :

# The role of this server. Setting this puts us in KRaft mode
process.roles=controller

# The node id associated with this instance's roles
node.id=1

# The connect string for the controller quorum
controller.quorum.voters=1@controller:9093

docker-compose.yaml

services:

broker:
build:
context: ./servers/broker/build_context
dockerfile: Dockerfile
ports:
- 9092:9092
depends_on:
controller:
condition: service_healthy

controller:
build:
context: ./servers/controller/build_context
dockerfile: Dockerfile
ports:
- 9093:9093
healthcheck:
test: nc -z controller 9093 || exit 1
interval: 2s
retries: 5
start_period: 5s
timeout: 10s

On terminal : docker compose up broker

Conclusion

Obviously the limit is in the creation of multiple brokers and controllers. While it is doable on this setting, it is not reasonable.

However it is possible to create an environment close to professional field and put in practice parts of your streaming process architecture (topic handling, custom Producer and Consumer classes, mock dirty data and check resilience of the program, etc…).

For the moment I’m practicing with a python application mocking 10 IoT machines each sending 1 piece of data per second to the producer, without issues.

--

--

Will Barillon
Will Barillon

Written by Will Barillon

A python developer / data scientist that wants to provide useful informations to everyone.

Responses (2)