KernelChaos Experiment

Although KernelChaos targets a certain pod, the performance of other pods are also impacted depending on the specific callchain and frequency. It is because all pods of the same host share the same kernel.

Linux kernel: version >= 4.18
in values.yaml

Below is a sample KernelChaos configuration file:

For more sample files, see . You can edit them as needed.

Description:

selector specifies the target pods for chaos injection. For more details, see Define the Scope of Chaos Experiment.
failkernRequest defines the specified injection mode (kmalloc, bio, etc.) with a call chain and an optional set of predicates. The fields are:
- failtype indicates what to fail, can be set to 0 / 1 / .
  - If 1, indicates alloc_page to fail (should_fail_alloc_page)
  - If , indicates bio to fail (should_fail_bio)
  For more information, see and inject_example.
- callchain indicates a special call chain, such as:

duration defines the duration for each chaos experiment. In the sample file above, the time chaos lasts for 10 seconds.

KernelChaos’s function is similar to , which guarantees the appropriate erroneous return of the specified injection mode (kmalloc, bio, etc.) given a call chain and an optional set of predicates.

You can read inject_example.txt to learn more.

Below is a sample program:

During the injection, the output is similar to this:

Although we use container_id to limit fault injection, but some behaviors might trigger systemic behaviors. For example:

When failtype is 1, it means that physical page allocation will fail. If the behavior is continuous in a very short time (eg: `), the system’s oom-killer will be awakened to release memory. So the container_id will lose limit to oom-killer.