Simulate Linux Kernel Faults
This document describes how to use KernelChaos to simulate Linux kernel faults. This feature injects I/O-based or memory-based faults into the specified kernel paths using BPF.
Although you can set the injection target of KernelChaos to one or several Pods, the performance of other Pods on the host will be affected, because all Pods share the same kernel.
warning
The simulation of Linux kernel faults is disabled by default. Do not use this feature in a production environment.
- Linux kernel version >= 4.18.
- The Linux kernel configuration CONFOG_BPF_KPROBE_OVERRIDE is enabled.
- The configuration value in is
true
.
A simple KernelChaos configuration file is as follows:
For more configuration examples, refer to examples. You can modify these configuration examples as needed.
Configuration description:
FailedkernRequest specifies the fault mode (such as kmallo and bio). It also specifies a specific call chain path and the optional filtering conditions. The configuration items are as follows:
Failtype specifies the fault type. The value options are as follows:
- ‘1’: injects the memory page allocation error should_fail_alloc_page.
- ‘2’: injects the bio error should_fail_bio.
For more information on these three fault types, refer to and inject_example.
Callchain specifies a specific call chain. For example:
You can also use the function parameters as filtering rules to inject more fine-grained faults. Refer to for more information. If no call chain is specified, keep the
callchain
field empty, indicating that faults will be injected to any path on which slab alloc is called (for example, kmallo).The call chain type is a frame array, consisting of the following three parts:
- funcname, which can be found from the kernel source code or from
/proc/kallsyms
, such asext4_mount
. - parameters, which is used for filtering. If you want to inject a slab error on the
d_alloc_parallel(struct dentry *parent, const struct qstr *name)
with a special namebananas
path, you need to set the tostruct dentry *parent, const struct qstr *name
. Otherwise, omit this configuration. - predicate, which is used to access the parameters of the frame array. Taking parameters as an example, you can set it to
STRNCMP(name->name, "bananas", 8)
to control the path of fault injection, or you can leave it empty for all call paths that executed_allo_parallel
receive the slab fault injection.
- funcname, which can be found from the kernel source code or from
headers specifies the kernel header file you need. For example, “linux/mmzone.h” and “linux/blkdev.h”.
Use kubectl
to create an experiment:
The KernelChaos feature is similar to inject.py. For more information, refer to .
A simple example is as follows:
During the fault injection, the output is as follows:
You can use container_id to limit the scope of the fault injection, but some paths trigger system-level behaviors. For example:
When failtype
is , it means that the physical page allocation fails. If this event is frequently triggered within a short period of time (for example, while (1) {memset(malloc(1M), '1', 1M)}
), the oom-killer system call is triggered to recycle memory.