Canary Rollout

    1. Make sure you have already enabled the addon, our canary rollout capability relies on the rollouts from OpenKruise.

    2. Please make sure one of the is available in your Kubernetes cluster. You can also enable the ingress-nginx or addon if you don’t have any:

      Please refer to the addon doc to get the access address of gateway.

    3. Some of the commands such as rollback relies on vela-cli >=1.5.0-alpha.1, please upgrade the command line for convenience. You don’t need to upgrade the controller.

    Limitation

    Kubernetes resources can be anything, this canary rollout works for the scenarios within the following limits:

    1. The Kubernetes resources contain a service pointing to the workload along with an ingress routing to the service.
    2. Workloads supported including Kubernetes Deployment, StatefulSet, and OpenKruise Cloneset. That means the workload specified must be one of these three types.

    When you want to use the canary rollout, you need to add the kruise-rollout trait at the first time, this configuration will take effect at next release process. Deploy the application with traits like below:

    1. cat <<EOF | vela up -f -
    2. apiVersion: core.oam.dev/v1beta1
    3. kind: Application
    4. metadata:
    5. name: canary-demo
    6. namespace: default
    7. annotations:
    8. app.oam.dev/publishVersion: v1
    9. spec:
    10. components:
    11. - name: canary-demo
    12. properties:
    13. objects:
    14. - apiVersion: apps/v1
    15. kind: Deployment
    16. metadata:
    17. name: canary-demo
    18. spec:
    19. replicas: 5
    20. selector:
    21. matchLabels:
    22. app: demo
    23. template:
    24. metadata:
    25. labels:
    26. app: demo
    27. spec:
    28. containers:
    29. - image: barnett/canarydemo:v1
    30. name: demo
    31. ports:
    32. - containerPort: 8090
    33. - apiVersion: v1
    34. kind: Service
    35. labels:
    36. app: demo
    37. name: canary-demo
    38. namespace: default
    39. spec:
    40. ports:
    41. - name: http
    42. port: 8090
    43. protocol: TCP
    44. targetPort: 8090
    45. selector:
    46. app: demo
    47. - apiVersion: networking.k8s.io/v1
    48. kind: Ingress
    49. labels:
    50. app: demo
    51. name: canary-demo
    52. namespace: default
    53. spec:
    54. ingressClassName: nginx
    55. rules:
    56. - host: canary-demo.com
    57. http:
    58. paths:
    59. - backend:
    60. service:
    61. name: canary-demo
    62. port:
    63. number: 8090
    64. path: /version
    65. pathType: ImplementationSpecific
    66. type: k8s-objects
    67. traits:
    68. - type: kruise-rollout
    69. properties:
    70. canary:
    71. steps:
    72. # The first batch of Canary releases 20% Pods, and 20% traffic imported to the new version, require manual confirmation before subsequent releases are completed
    73. - weight: 20
    74. # The second batch of Canary releases 90% Pods, and 90% traffic imported to the new version.
    75. - weight: 90
    76. trafficRoutings:
    77. - type: nginx
    78. EOF

    Here’s an overview about what will happen when upgrade under this kruise-rollout trait configuration, the whole process will be divided into 3 steps:

    1. When the upgrade start, a new canary deployment will be created with 20% of the total replicas. In our example, we have 5 total replicas, it will keep all the old ones and create 5 * 20% = 1 for the new canary, and serve for 20% of the traffic. It will wait for a manual approval when everything gets ready.
      • By default, the percent of replicas are aligned with the traffic, you can also configure the replicas individually according to this doc.
    2. After the manual approval, the second batch starts. It will create 5 * 90% = 4.5 which is actually 5 replicas of new version in the system with the 90% traffic. As a result, the system will totally have 10 replicas now. It will wait for a second manual approval.

    Let’s continue our demo, the first deployment has no difference with a normal deploy, you can check the status of application to make sure it’s running for our next step.

    1. vela status canary-demo

    Access the gateway endpoint with the specific host.

    1. $ curl -H "Host: canary-demo.com" <ingress-controller-address>/version
    2. Demo: V1

    Modify the image tag from v1 to v2 as follows:

    It will create a canary deployment and wait for manual approval, check the status of the application:

    1. $ vela status canary-demo
    2. About:
    3. Name: canary-demo
    4. Namespace: default
    5. Created at: 2022-06-09 16:43:10 +0800 CST
    6. Status: runningWorkflow
    7. mode: DAG
    8. finished: false
    9. Suspend: false
    10. Terminated: false
    11. Steps
    12. - id:8adxa11wku
    13. name:canary-demo
    14. type:apply-component
    15. phase:running
    16. message:wait healthy
    17. Services:
    18. - Name: canary-demo
    19. Cluster: local Namespace: default
    20. Type: webservice
    21. Unhealthy Ready:5/5
    22. Traits:
    23. scaler gateway: No loadBalancer found, visiting by using 'vela port-forward canary-demo'
    24. kruise-rollout: Rollout is in step(1/1), and you need manually confirm to enter the next step

    The application’s status is runningWorkflow that means the application’s rollout process has not finished yet.

    View topology graph again, you will see kruise-rollout trait created a v2 pod, and this pod will serve the canary traffic. Meanwhile, the pods of v1 are still running and server non-canary traffic.

    image

    Access the gateway endpoint again. You will find out there is about 20% chance to meet Demo: v2 result.

    1. $ curl -H "Host: canary-demo.com" <ingress-controller-address>/version
    2. Demo: V2

    Continue Canary Process

    1. vela workflow resume canary-demo

    Access the gateway endpoint again multi times. You will find out the chance to meet result Demo: v2 is highly increased, almost 90%.

    1. $ curl -H "Host: canary-demo.com" <ingress-controller-address>/version
    2. Demo: V2

    In the end, you can resume again to finish the rollout process.

    Access the gateway endpoint again multi times. You will find out the result always is Demo: v2.

    1. $ curl -H "Host: canary-demo.com" <ingress-controller-address>/version
    2. Demo: V2

    Canary verification failed, rollback the release

    If you want to cancel the rollout process and rollback the application to the latest version, after manually check. You can rollback the rollout workflow:

    You should suspend the workflow before rollback:

    1. $ vela workflow suspend canary-demo
    2. Rollout default/canary-demo in cluster suspended.
    3. Successfully suspend workflow: canary-demo

    Then rollback:

    1. $ vela workflow rollback canary-demo
    2. Application spec rollback successfully.
    3. Application status rollback successfully.
    4. Rollout default/canary-demo in cluster rollback.
    5. Successfully rollback rolloutApplication outdated revision cleaned up.

    Access the gateway endpoint again. You can see the result always is Demo: V1.

    1. Demo: V1

    Any rollback operation in middle of a runningWorkflow will rollback to the latest succeeded revision of this application. So, if you deploy a successful v1 and upgrade to v2, but this version didn’t succeed while you continue to upgrade to v3. The rollback of v3 will automatically to v1, because release v2 is not a succeeded one.