KernelChaos Experiment
Although KernelChaos targets a certain pod, the performance of other pods are also impacted depending on the specific callchain and frequency. It is because all pods of the same host share the same kernel.
- Linux kernel: version >= 4.18
- in values.yaml
Below is a sample KernelChaos configuration file:
For more sample files, see . You can edit them as needed.
Description:
selector specifies the target pods for chaos injection. For more details, see Define the Scope of Chaos Experiment.
failkernRequest defines the specified injection mode (kmalloc, bio, etc.) with a call chain and an optional set of predicates. The fields are:
failtype indicates what to fail, can be set to
0
/1
/ .- If
1
, indicates alloc_page to fail (should_fail_alloc_page) - If , indicates bio to fail (should_fail_bio)
For more information, see and inject_example.
- If
callchain indicates a special call chain, such as:
duration defines the duration for each chaos experiment. In the sample file above, the time chaos lasts for 10 seconds.
KernelChaos’s function is similar to , which guarantees the appropriate erroneous return of the specified injection mode (kmalloc, bio, etc.) given a call chain and an optional set of predicates.
You can read inject_example.txt to learn more.
Below is a sample program:
During the injection, the output is similar to this:
Although we use container_id to limit fault injection, but some behaviors might trigger systemic behaviors. For example:
When failtype
is 1
, it means that physical page allocation will fail. If the behavior is continuous in a very short time (eg: `), the system’s oom-killer will be awakened to release memory. So the container_id will lose limit to oom-killer.