Extend Chaos Daemon Interface

This document covers:

In api/v1alpha1/helloworldchaos_type.go, you have defined HelloWorldSpec, which includes ContainerSelector:

In Chaos Mesh, Selector is used to define the scope of a chaos experiment, the target namespace, annotation, label, etc. Selector can also be some more specific values (for example, AWSSelector in AWSChaos). Usually, each chaos experiment requires only one Selector, with exceptions such as NetworkChaos because it sometimes needs two Selectors as two objects for network partition.

To allow Chaos Daemon to accept the requests from Chaos Controller Manager, you need to implement a new gRPC interface.

  1. Add the RPC in pkg/chaosdaemon/pb/chaosdaemon.proto:

    1. service chaosDaemon {
    2. ...
    3. rpc ExecHelloWorldChaos(ExecHelloWorldRequest) returns (google.protobuf.Empty) {}
    4. }
    5. message ExecHelloWorldRequest {
    6. string container_id = 1;
    7. }

    You need to update the Golang code generated by this proto file:

    1. make proto
  2. Implement gRPC services in Chaos Daemon.

    In the pkg/chaosdaemon directory, create a file named helloworld_server.go with the following contents:

    1. package chaosdaemon
    2. import (
    3. "context"
    4. "fmt"
    5. "github.com/golang/protobuf/ptypes/empty"
    6. "github.com/chaos-mesh/chaos-mesh/pkg/bpm"
    7. pb "github.com/chaos-mesh/chaos-mesh/pkg/chaosdaemon/pb"
    8. func (s *DaemonServer) ExecHelloWorldChaos(ctx context.Context, req *pb.ExecHelloWorldRequest) (*empty.Empty, error) {
    9. log.Info("ExecHelloWorldChaos", "request", req)
    10. pid, err := s.crClient.GetPidFromContainerID(ctx, req.ContainerId)
    11. if err != nil {
    12. return nil, err
    13. }
    14. cmd := bpm.DefaultProcessBuilder("sh", "-c", fmt.Sprintf("ps aux")).
    15. SetNS(pid, bpm.MountNS).
    16. SetContext(ctx).
    17. Build()
    18. out, err := cmd.Output()
    19. if err != nil {
    20. return nil, err
    21. }
    22. if len(out) != 0 {
    23. log.Info("cmd output", "output", string(out))
    24. }
    25. return &empty.Empty{}, nil
    26. }
  3. Send a gRPC request when applying the chaos experiment.

    Each chaos experiment has its life cycle: apply and then . However, there are some chaos experiments that cannot be recovered by default (for example, PodKill in PodChaos, and HelloWorldChaos). These are called OneShot experiments. You can find +chaos-mesh:oneshot=true in the file that defines the schema type of chaos experiment type.

    Chaos Controller Manager needs to send a request to Chaos Daemon when HelloWorldChaos is in recover. To do this, you need to modify controllers/chaosimpl/helloordchaos/types.go:

    :::note In this chaos experiment, there is no need to recover the chaos action. This is because HelloWorldChaos is a OneShot experiment. For the chaos experiment type you developed, you can implement the logic of the recovery function as needed. :::

To verify the experiment, perform the following steps.

  1. Update Chaos Mesh:

  2. Deploy the target Pod for testing. Skip this step if you have already deployed this Pod:

    1. kubectl apply -f https://raw.githubusercontent.com/chaos-mesh/apps/master/ping/busybox-statefulset.yaml
    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: HelloWorldChaos
    3. metadata:
    4. name: busybox-helloworld-chaos
    5. spec:
    6. selector:
    7. namespaces:
    8. - busybox
    9. mode: all
    10. duration: 1h
  3. Apply the chaos experiment:

  4. Verify the results. You can check several logs:

    • Check the logs of Chaos Controller Manager:
    1. kubectl logs chaos-controller-manager-{pod-post-fix} -n chaos-testing

    Example output:

    1. 2021-06-25T06:02:12.754Z INFO records apply chaos {"id": "busybox/busybox-1/busybox"}
    2. 2021-06-25T06:02:12.754Z INFO helloworldchaos Apply helloworld chaos
    • Check the logs of Chaos Daemon:
    1. kubectl logs chaos-daemon-{pod-post-fix} -n chaos-testing

    Example output:

    You can see ps aux in two separate lines, which are corresponding to two different Pods.

    :::note If your cluster has multiple nodes, you will find more than one Chaos Daemon Pod. Try to check logs of every Chaos Daemon Pods and find which Pod is being called. :::

If you encounter any problems in this process, create an in the Chaos Mesh repository.

If you are curious about how all of these come into effect, you can read the README files of different controllers in the controller directory to learn their functionalities. For example, controllers/common/README.md.