Modeling how the system behaves when the disk runs out
Setting up a test environment using Podman containers to model disk exhaustion
It is very important to understand what the system can guarantee. I mean guarantees according to unexpected situations, such as server failure, network outage, disk run out, out of memory. Will it cause data loss? How long will it take to fix the system? What exactly should SRE do?
This article describes one of the methods to test system behavior during disk run out. I decided to use containers as a test environment as soon as containers are convenient way to run applications. This article describes the Podman container runtime because it is the default runtime in Ubuntu and other distributions. Of course, this method could be applied to any other runtimes.
The problem
The main problem is to limit available disk space for container. By default container runtime will use all available server disk space. If container disk space is unlimited and container used all available disc space, not only testing system will be affected, but container runtime and other containers.
To model disk exhaustion, we will use a limited size volume. The created limited-size volume is then mounted on the container as a data directory.
One of possible solutions to create volume of limited size is to create file of necessary size, then mount this file as new device, named loop device, available to Podman as directory. The last step is to mount created directory to test container as data directory. It can be done during container creation, as directory mapping.
The solution
As an example Kafka container was chosen. We will test Kafka's behavior in case of disk exhaustion.
Step 0. Enter podman runtime
Enter the server running Podman containers.
Step 1. Install additional packages
sudo yum install e2fsprogs
Step 2. Create file to store data
mkdir /home/user/data/volumes
cd /home/user/data/volumes
dd if=/dev/zero of=kafka_ext4_256M.data bs=1 count=0 seek=256M
mkfs -t ext4 -q kafka_ext4_256M.data
Step 3. Mount created file as loop device and folder
mkdir kafka_ext4_256M
sudo mount -o loop,rw kafka_ext4_256M.data /home/user/data/volumes/kafka_ext4_256M
sudo chown -R user:user /home/user/data/volumes/kafka_ext4_256M
Check mounted dir
df -h
/dev/loop0 230M 214M 1.0K 100% /home/user/data/volumes/kafka_ext4_256M
Step 4. Create Podman pod
podman pod create --name kafka-test -p 2181:2181 -p 9092:9092
podman pod list
Step 5. Add Zookeeper container into the pod
podman run -d -it --pod kafka-test --env 'ALLOW_ANONYMOUS_LOGIN=yes' --name zoo --volume /home/user/data/volumes/kafka_ext4_256M:/bitnami:rw,U,Z bitnami/zookeeper:3.8
Step 6. Add Kafka container with mounted volume
podman run -d -it --pod kafka-test --env 'KAFKA_CFG_ZOOKEEPER_CONNECT=zoo:2181' --env 'ALLOW_PLAINTEXT_LISTENER=yes' --name kaf --volume /home/user/data/volumes/kafka_ext4_256M:/bitnami:rw,U,Z bitnami/kafka:3.4
Step 7. Create Kafka and generate a lot of messages
podman exec -it kaf bash
cd /opt/bitnami/kafka/bin
kafka-topics.sh --create --topic quickstart-events --bootstrap-server zoo:9092
kafka-topics.sh --describe --topic quickstart-events --bootstrap-server zoo:9092
for x in {1..1000000000}; do echo $x; done | kafka-console-producer.sh --topic quickstart-events --bootstrap-server zoo:9092
Such logs were written by Kafka containers
[2023-03-29 12:24:33,455] ERROR Error while appending records to quickstart-events-0 in dir /bitnami/kafka/data (kafka.server.LogDirFailureChannel)
java.io.IOException: No space left on device
[2023-03-29 12:24:33,508] WARN Stopping serving logs in dir /bitnami/kafka/data (kafka.log.LogManager)
[2023-03-29 12:24:33,512] ERROR Shutdown broker because all log dirs in /bitnami/kafka/data have failed (kafka.log.LogManager)
The conclusion
Experiment showed that Kafka shuts down when the disk runs out. Kafka stops receiving messages, shuts down, and the consumer application gets an error. It also means that Kafka doesn't slow down in case of disk exhaustion, it shuts down completely.
We have successfully created a model of disk exhaustion using the Kafka container as an example. I hope you will find this technique helpful for testing your system and its components.