KernelChaos Experiment
Although KernelChaos targets a certain pod, the performance of other pods are also impacted depending on the specific callchain and frequency. It is because all pods of the same host share the same kernel.
- Linux kernel: version >= 4.18
- CONFIG_BPF_KPROBE_OVERRIDE enabled
- in
Below is a sample KernelChaos configuration file:
For more sample files, see examples. You can edit them as needed.
Description:
failkernRequest defines the specified injection mode (kmalloc, bio, etc.) with a call chain and an optional set of predicates. The fields are:
failtype indicates what to fail, can be set to
0
/1
/2
.- If
0
, indicates slab to fail (should_failslab) - If
1
, indicates alloc_page to fail (should_fail_alloc_page) - If , indicates bio to fail (should_fail_bio)
For more information, see and inject_example.
- If
callchain indicates a special call chain, such as:
With an optional set of predicates and an optional set of parameters, which used with predicates. See to learn more. If there is no special call chain, just keep
callchain
empty, which means it will fail at any call chain with slab alloc (eg: kmalloc).The challchain’s type is an array of frames, the frame has three fields:
- parameters is used with predicate, for example, if you want to inject slab error in
d_alloc_parallel(struct dentry *parent, const struct qstr *name)
with a special namebananas
, you need to set it to otherwise omit it. - predicate accesses the arguments of this frame, example with parameters’s, you can set it to
STRNCMP(name->name, "bananas", 8)
to make inject only with it, or omit it to inject for all d_alloc_parallel call chain.
- parameters is used with predicate, for example, if you want to inject slab error in
headers indicates the appropriate kernel headers you need. Eg: “linux/mmzone.h”, “linux/blkdev.h” and so on.
times indicates the max times of fails.
duration defines the duration for each chaos experiment. In the sample file above, the time chaos lasts for 10 seconds.
scheduler defines the scheduler rules for the running time of the chaos experiment. For more rule information, see robfig/cron
KernelChaos’s function is similar to , which guarantees the appropriate erroneous return of the specified injection mode (kmalloc, bio, etc.) given a call chain and an optional set of predicates.
You can read inject_example.txt to learn more.
Below is a sample program:
During the injection, the output is similar to this:
When failtype
is 1
, it means that physical page allocation will fail. If the behavior is continuous in a very short time (eg: `while (1) {memset(malloc(1M), '1', 1M)}
), the system’s oom-killer will be awakened to release memory. So the container_id will lose limit to oom-killer.