Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

Overview

flagger

build report license release

Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics and running conformance tests.

flagger-overview

Flagger implements several deployment strategies (Canary releases, A/B testing, Blue/Green mirroring) using a service mesh (App Mesh, Istio, Linkerd) or an ingress controller (Contour, Gloo, NGINX, Skipper, Traefik) for traffic routing. For release analysis, Flagger can query Prometheus, Datadog, New Relic or CloudWatch and for alerting it uses Slack, MS Teams, Discord and Rocket.

Flagger is a Cloud Native Computing Foundation project and part of Flux family of GitOps tools.

Documentation

Flagger documentation can be found at docs.flagger.app.

Who is using Flagger

List of organizations using Flagger:

If you are using Flagger, please submit a PR to add your organization to the list!

Canary CRD

Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA), then creates a series of objects (Kubernetes deployments, ClusterIP services, service mesh or ingress routes). These objects expose the application on the mesh and drive the canary analysis and promotion.

Flagger keeps track of ConfigMaps and Secrets referenced by a Kubernetes Deployment and triggers a canary analysis if any of those objects change. When promoting a workload in production, both code (container images) and configuration (config maps and secrets) are being synchronised.

For a deployment named podinfo, a canary promotion can be defined using Flagger's custom resource:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  # service mesh provider (optional)
  # can be: kubernetes, istio, linkerd, appmesh, nginx, skipper, contour, gloo, supergloo, traefik
  provider: istio
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    # service name (defaults to targetRef.name)
    name: podinfo
    # ClusterIP port number
    port: 9898
    # container port name or number (optional)
    targetPort: 9898
    # port name can be http or grpc (default http)
    portName: http
    # add all the other container ports
    # to the ClusterIP services (default false)
    portDiscovery: true
    # HTTP match conditions (optional)
    match:
      - uri:
          prefix: /
    # HTTP rewrite (optional)
    rewrite:
      uri: /
    # request timeout (optional)
    timeout: 5s
  # promote the canary without analysing it (default false)
  skipAnalysis: false
  # define the canary analysis timing and KPIs
  analysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 10
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 5
    # validation (optional)
    metrics:
    - name: request-success-rate
      # builtin Prometheus check
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      # builtin Prometheus check
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    - name: "database connections"
      # custom metric check
      templateRef:
        name: db-connections
      thresholdRange:
        min: 2
        max: 100
      interval: 1m
    # testing (optional)
    webhooks:
      - name: "conformance test"
        type: pre-rollout
        url: http://flagger-helmtester.test/
        timeout: 5m
        metadata:
          type: "helmv3"
          cmd: "test run podinfo -n test"
      - name: "load test"
        type: rollout
        url: http://flagger-loadtester.test/
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
    # alerting (optional)
    alerts:
      - name: "dev team Slack"
        severity: error
        providerRef:
          name: dev-slack
          namespace: flagger
      - name: "qa team Discord"
        severity: warn
        providerRef:
          name: qa-discord
      - name: "on-call MS Teams"
        severity: info
        providerRef:
          name: on-call-msteams

For more details on how the canary analysis and promotion works please read the docs.

Features

Service Mesh

Feature App Mesh Istio Linkerd Kubernetes CNI
Canary deployments (weighted traffic) ✔️ ✔️ ✔️
A/B testing (headers and cookies routing) ✔️ ✔️
Blue/Green deployments (traffic switch) ✔️ ✔️ ✔️ ✔️
Blue/Green deployments (traffic mirroring) ✔️
Webhooks (acceptance/load testing) ✔️ ✔️ ✔️ ✔️
Manual gating (approve/pause/resume) ✔️ ✔️ ✔️ ✔️
Request success rate check (L7 metric) ✔️ ✔️ ✔️
Request duration check (L7 metric) ✔️ ✔️ ✔️
Custom metric checks ✔️ ✔️ ✔️ ✔️

Ingress

Feature Contour Gloo NGINX Skipper Traefik
Canary deployments (weighted traffic) ✔️ ✔️ ✔️ ✔️ ✔️
A/B testing (headers and cookies routing) ✔️ ✔️ ✔️
Blue/Green deployments (traffic switch) ✔️ ✔️ ✔️ ✔️ ✔️
Webhooks (acceptance/load testing) ✔️ ✔️ ✔️ ✔️ ✔️
Manual gating (approve/pause/resume) ✔️ ✔️ ✔️ ✔️ ✔️
Request success rate check (L7 metric) ✔️ ✔️ ✔️ ✔️
Request duration check (L7 metric) ✔️ ✔️ ✔️ ✔️
Custom metric checks ✔️ ✔️ ✔️ ✔️ ✔️

Roadmap

GitOps Toolkit compatibility

  • Migrate Flagger to Kubernetes controller-runtime and kubebuilder
  • Make the Canary status compatible with kstatus
  • Make Flagger emit Kubernetes events compatible with Flux v2 notification API
  • Integrate Flagger into Flux v2 as the progressive delivery component

Integrations

  • Add support for Kubernetes Ingress v2
  • Add support for SMI compatible service mesh solutions like Open Service Mesh and Consul Connect
  • Add support for ingress controllers like HAProxy and ALB
  • Add support for metrics providers like InfluxDB, Stackdriver, SignalFX

Contributing

Flagger is Apache 2.0 licensed and accepts contributions via GitHub pull requests. To start contributing please read the development guide.

When submitting bug reports please include as much details as possible:

  • which Flagger version
  • which Flagger CRD version
  • which Kubernetes version
  • what configuration (canary, ingress and workloads definitions)
  • what happened (Flagger and Proxy logs)

Getting Help

If you have any questions about Flagger and progressive delivery:

Your feedback is always welcome!

Comments
  • Specifying multiple HTTP match uri in Istio Canary deployment via Flagger

    Specifying multiple HTTP match uri in Istio Canary deployment via Flagger

    I am gonna use automatic Canary deployment so I tried to follow the process via Flagger. Here was my VirtualService file for routing:

    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: {{ .Values.project }}
      namespace: {{ .Values.service.namespace }}
    spec:
      hosts:
        - {{ .Values.subdomain }}
      gateways:
        - mygateway.istio-system.svc.cluster.local
      http:
        {{- range $key, $value := .Values.routing.http }}
        - name: {{ $key }}
    {{ toYaml $value | indent 6 }}
        {{- end }}
    

    Which the routing part looks like this:

    http:
        r1:
          match:
            - uri:
                prefix: /myservice/monitor
          route:
            - destination:
                host: myservice
                port:
                  number: 9090
        r2:
          match:
            - uri:
                prefix: /myservice
          route:
            - destination:
                host: myservice
                port:
                  number: 8080
          corsPolicy:
            allowCredentials: false
            allowHeaders:
            - X-Tenant-Identifier
            - Content-Type
            - Authorization
            allowMethods:
            - GET
            - POST
            - PATCH
            allowOrigin:
            - "*"
            maxAge: 24h    `
    

    However as I found the Flagger overwites the virtualservice, I have removed this file and modified the canary.yaml file based on my requirements but I get yaml error:

    {{- if .Values.canary.enabled }}
    apiVersion: flagger.app/v1alpha3
    kind: Canary
    metadata:
      name: {{ .Values.project }}
      namespace: {{ .Values.service.namespace }}
      labels:
        app: {{ .Values.project }}
        chart: {{ template "myservice-chart.chart" . }}
        release: {{ .Release.Name }}
        heritage: {{ .Release.Service }}
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name:  {{ .Values.project }}
      progressDeadlineSeconds: 60
      autoscalerRef:
        apiVersion: autoscaling/v2beta1
        kind: HorizontalPodAutoscaler
        name:  {{ .Values.project }}    
      service:
        port: 8080
        portDiscovery: true
        {{- if .Values.canary.istioIngress.enabled }}
        gateways:
        -  {{ .Values.canary.istioIngress.gateway }}
        hosts:
        - {{ .Values.canary.istioIngress.host }}
        {{- end }}
        trafficPolicy:
          tls:
            # use ISTIO_MUTUAL when mTLS is enabled
            mode: DISABLE
        # HTTP match conditions (optional)
        match:
          - uri:
              prefix: /myservice
        # cross-origin resource sharing policy (optional)
          corsPolicy:
            allowOrigin:
              - "*"
            allowMethods:
              - GET
              - POST
              - PATCH
            allowCredentials: false
            allowHeaders:
              - X-Tenant-Identifier
              - Content-Type
              - Authorization
            maxAge: 24h
          - uri:
              prefix: /myservice/monitor
      canaryAnalysis:
        interval: {{ .Values.canary.analysis.interval }}
        threshold: {{ .Values.canary.analysis.threshold }}
        maxWeight: {{ .Values.canary.analysis.maxWeight }}
        stepWeight: {{ .Values.canary.analysis.stepWeight }}
        metrics:
        - name: request-success-rate
          threshold: {{ .Values.canary.thresholds.successRate }}
          interval: 1m
        - name: request-duration
          threshold: {{ .Values.canary.thresholds.latency }}
          interval: 1m
        webhooks:
          {{- if .Values.canary.loadtest.enabled }}
          - name: load-test-get
            url: {{ .Values.canary.loadtest.url }}
            timeout: 5s
            metadata:
              cmd: "hey -z 1m -q 5 -c 2 http://myservice.default:8080"
          - name: load-test-post
            url: {{ .Values.canary.loadtest.url }}
            timeout: 5s
            metadata:
              cmd: "hey -z 1m -q 5 -c 2 -m POST -d '{\"test\": true}' http://myservice.default:8080/echo"
          {{- end }}  
    {{- end }}
    

    Can anyone help with this issue?

    opened by Mina69 39
  • Add canary finalizers

    Add canary finalizers

    @stefanprodan This is a work in progress PR looking for acceptance on the approach and feedback. This PR provides the opt-in capability for users to revert flagger mutations on deletion of a canary. If users opt-in finalizers will be utilized to revert the mutated resources before the canary and owned resources are handed over for finalizing.

    Changes: Add evaluations for finalizers controller/controller Add finalizers source controller/finalizer Add interface method on deployment and daemonset controllers Add interface method on routers Add e2e tests

    Work to be done: Cover mesh and ingress outside of Istio

    Fix: #388 Fix: #488

    opened by ta924 23
  • Gloo Canary Release Docs Discrepancy

    Gloo Canary Release Docs Discrepancy

    I am trying to get a simple POC working with Gloo and Flagger, however the example docs don't work out-of-the-box.

    I also noticed the example virtual-service is different in the docs compared to what's in the repo?

    The specifics regarding mapping a virtual-service to an upstream seem to be different in both and I just want to know what I should follow to get this working.

    I would make an issue on Gloo's repository, however I'm unsure if my error is stemming with Gloo or me following the wrong docs.

    opened by BailyTroyer 19
  • Only unique values for domains are permitted error with Istio 1.1.0 RC1

    Only unique values for domains are permitted error with Istio 1.1.0 RC1

    Right now, due to istio limitations, it is not possible to create a virtualservice with a mesh and another host name. For example:

    if I have:

    ...
    gateways:
    - www.myapp.com
    - mesh
    http:
      - match:
        - uri:
            prefix: /api
        route:
        - destination:
            host: api.default.svc.cluster.local
            port:
              number: 80
    

    and

    ...
    gateways:
    - www.myapp.com
    - mesh
    http:
      - match:
        - uri:
            prefix: /internal
        route:
        - destination:
            host: internal.default.svc.cluster.local
            port:
              number: 80
    

    Istio will throw an error

    Only unique values for domains are permitted. Duplicate entry of domain www.myapp.com"
    

    The two ways of fixing this I see is for flagger to either:

    1. Create a separate virtualservice and maintain the canary settings for each one correlated to the particular service deployed
    2. Compile all virtualservices together into a singular virtualservice

    Let me know what you think!

    kind/upstream-bug 
    opened by tzilist 19
  • Unable to perform Istio-A/B testing

    Unable to perform Istio-A/B testing

    Hey guys I have configured Istio as a service mesh in my Kubernetes. I wanted to try A/B testing deployment-strategy along with Flagger.

    I followed the following documentation to set up Flagger 1.) https://docs.flagger.app/usage/ab-testing 2.) https://docs.flagger.app/how-it-works#a-b-testing

    It throws me a VirtualService error: virtualservice:publisher-d8t-v1 Weight sum should be 100 when i check my Kiali dashboard.

    And on describing the canary the canary fails due to no traffic was generated. Although i made post call to my service and a response status of 200 was returned.

    Can you please help me fix this error.

    I have attached the respective screenshots .

    VirtualService Error in Kiali: Screenshot (103)_LI

    Canary Status: Screenshot (109)_LI

    Traffic Generation and its status: Screenshot (107)_LI

    Screenshot (105)_LI

    Can you please help me resolve this issue !!

    Also according to istio documentation, to connect a VirtualService with DestinationRule we need to use subsets. But i see no subsets being created. How are you able to achieve traffic routing without a subset. I did read as note keep a label as app:deployment name, is this solving the purpose ??.

    Thanks in advance :)

    question 
    opened by LochanRn 18
  • istio no values found for metric request-success-rate

    istio no values found for metric request-success-rate

    given following:

          metrics:
          - interval: 1m
            name: request-success-rate
            threshold: 99
          - interval: 30s
            name: request-duration
            threshold: 500
          stepWeight: 10
          threshold: 5
          webhooks:
          - metadata:
              cmd: hey -z 10m -q 10 -c 2 http://conf-day-demo-rest.conf-day-demo:8080/greeting
            name: conf-day-demo-loadtest
            timeout: 5s
            url: http://loadtester.loadtester/
    

    Canary promotion fails with Halt advancement no values found for metric request-success-rate probably conf-day-demo-rest.conf-day-demo is not receiving traffic

    querying metrics manually I see metrics for conf-day-demo-rest-primary but flagger queries for

    destination_workload=~"{{ .Name }}"
    ``` which returns no data 
    
    opened by k0da 16
  • Canary ingress nginx prevent update of primary ingress due to admission webhook

    Canary ingress nginx prevent update of primary ingress due to admission webhook

    Hi all,

    we have a problem with ingress admission webhook. Using podinfo as example we did a canary deployment.

    Flagger created second ingress and after rollout was done it switch "canary" annotation from "true" to "false":

    apiVersion:  networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
      annotations:
        kubernetes.io/ingress.class: nginx-v2
        nginx.ingress.kubernetes.io/canary: "false"
    

    I added "test" annotation to main Ingress to trigger update:

    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
      name: podinfo
      labels:
        app: podinfo
      annotations:
        kubernetes.io/ingress.class: "nginx-v2"
        test: "test"
    ...
    

    Now when I try to apply main Ingress file I get admission webhook error:

    Error from server (BadRequest): error when creating "podinfo.yaml": 
    admission webhook "validate.nginx.ingress.kubernetes.io" 
    denied the request: host "example.com" and 
    path "/" is already defined in ingress develop/podinfo-canary
    

    podinfo Ingress

    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
      name: podinfo
      labels:
        app: podinfo
      annotations:
        kubernetes.io/ingress.class: "nginx-v2"
    spec:
      rules:
        - host: example.com
          http:
            paths:
              - backend:
                  serviceName: podinfo
                  servicePort: 80
      tls:
      - hosts:
        - example.com
        secretName: example.com.wildcard
    

    Flagger version: 1.6.1 Ingress nginx version: 0.43

    opened by vorozhko 15
  • Blue/Green deployment - ELB collides with ClusterIP Flagger services.

    Blue/Green deployment - ELB collides with ClusterIP Flagger services.

    Hey everybody. I wanted to give you some feedback from my learning process using Flagger and ask you a couple of question on how to fix an issue I've been having with my current use case.

    Here is it: I have an EKS cluster with two namespace, one for testing (called staging) and another for production. I've been trying to add Flagger to the staging namespace in order to enable Blue Green Deployments from my GitLab pipeline.

    How do I do that? Well, I've set up a gitlab job that basically runs a kubectl command and applies the files that I've added below. This is a very basic application, that means I've been trying to implement Blue/Green style deployments with Kubernetes L4 networking.

    Here is the order of how files get applied:

    1. namespace
    2. canary
    3. deployment
    4. service

    I've also created a drawing to help you illustrate the situation a little bit better.

    image

    The problem with this is approach is that as soon as I apply the load balanacer manifest I got this error:

     The Service "my-app" is invalid: spec.ports[2].name: Duplicate value: "http"
    

    I've tried applying this the same configuration on the production environment and it did work. My guess here is that somehow Flagger's ClusterIP services are creating a conflict with my load balancer, leading to a possible collision between them.

    I hope that you can help me with this issue, I'll keep you posted if I find a solution.

    namespace.yaml

    apiVersion: v1
    kind: Namespace
    metadata:
      name: staging
    

    deployment.yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
      namespace: staging
      labels:
        app: my-app
        environment: staging
    spec:
      replicas: 1
      strategy:
        type: Recreate
      selector:
        matchLabels:
          app: my-app
          environment: staging
      template:
        metadata:
          labels:
            app: my-app
            environment: staging
          annotations:
            configHash: " "
        spec:
          containers:
            - name: my-app
              image: marcoshuck/my-app
              imagePullPolicy: Always
              ports:
                - containerPort: 8001
              envFrom:
                - configMapRef:
                    name: my-app-config
          nodeSelector:
            server: "true"
    

    load-balancer.yaml

    apiVersion: v1
    kind: Service
    metadata:
      name: my-app
      namespace: staging
      annotations:
        # Use HTTP to talk to the backend.
        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
        # Amazon (AMC) certificate ARN
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: XXXXXXXXXXXXXXXXXXXXXXXXX
        # Only run SSL on the port named "tls" below.
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
    spec:
      type: LoadBalancer
      ports:
      - name: http
        port: 80
        targetPort: 8001
      - name: https
        port: 443
        targetPort: 8001
      selector:
        app: my-app
        environment: staging
    

    canary.yaml

    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: my-app
      namespace: staging
    spec:
      provider: kubernetes
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
      progressDeadlineSeconds: 60
      service:
        port: 8001
        portDiscovery: true
      analysis:
        interval: 30s
        threshold: 3
        iterations: 10
        metrics:
          - name: request-success-rate
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            thresholdRange:
              max: 500
            interval: 30s
        webhooks:
          - name: load-test
            url: http://flagger-loadtester.test/
            timeout: 5s
            metadata:
              type: cmd
              cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.test:8001/"
    
    opened by marcoshuck 14
  • progressDeadlineSeconds not working while waiting for rollout to finish

    progressDeadlineSeconds not working while waiting for rollout to finish

    Hi, In my deployment, I use progressDeadlineSeconds: 1200, and in the canary definition, I use canary deployment with buildin prometheus check. The canary app crashed, the deployment should be rolled back. but seems it isn't.

    my-app-deployment-58b7ffb786-7dk4h                     1/2     CrashLoopBackOff   109        9h
    my-app-deployment-primary-84f69c75c4-9d7x7             2/2     Running            0          16h
    

    And the flagger logs always shows following messages with infinite loop.

    {"level":"info","ts":"2020-03-26T01:38:32.078Z","caller":"controller/events.go:27","msg":"canary deployment my-app-deployment.test not ready with retryable true: waiting for
    rollout to finish: 0 of 1 updated replicas are available","canary":"my-app-canary.test"}
    

    The canary I use:

    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: my-app-canary
      namespace: test
    spec:
      # deployment reference
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app-deployment
      # the maximum time in seconds for the canary deployment
      # to make progress before it is rollback (default 600s)
      progressDeadlineSeconds: 1200
      # HPA reference (optional)
      autoscalerRef:
        apiVersion: autoscaling/v2beta1
        kind: HorizontalPodAutoscaler
        name: my-app-hpa
      service:
        # ClusterIP port number
        port: 80
        # container port name or number (optional)
        targetPort: 8080
        # Istio virtual service host names (optional)
        trafficPolicy:
          tls:
            mode: ISTIO_MUTUAL
      analysis:
        # schedule interval (default 60s)
        interval: 1m
        # max number of failed iterations before rollback
        threshold: 5
        # max traffic percentage routed to canary
        # percentage (0-100)
        maxWeight: 50
        # canary increment step
        # percentage (0-100)
        stepWeight: 10
        metrics:
          - name: request-success-rate
            # builtin Prometheus check
            # minimum req success rate (non 5xx responses)
            # percentage (0-100)
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            # builtin Prometheus check
            # maximum req duration P99
            # milliseconds
            thresholdRange:
              max: 500
            interval: 30s
        webhooks:
          - name: acceptance-test
            type: pre-rollout
            url: http://blueprint-test-loadtester.blueprint-test/
            timeout: 30s
            metadata:
              type: bash
              cmd: "curl http://my-app-deployment-canary.test"
          - name: load-test
            type: rollout
            url: http://blueprint-test-loadtester.blueprint-test/
            timeout: 5s
            metadata:
              cmd: "hey -z 1m -q 10 -c 2 http://my-app-deployment-canary.test"
    
    opened by BarrieShieh 14
  • Add HTTP match conditions to Canary service spec

    Add HTTP match conditions to Canary service spec

    Could you show an example of how to use this with the istio ingress? I can't seem to figure out how to point to the correct service!

    More specifically, is it possible to tell the istio ingress to route based on certain criteria (i.e. a uri prefix, etc?)

    kind/feature 
    opened by tzilist 14
  • Flagger omits `TrafficSplit` backend service weight if weight is 0 due to `omitempty` option

    Flagger omits `TrafficSplit` backend service weight if weight is 0 due to `omitempty` option

    Describe the bug

    Since OSM is supported (SMI support added in #896), I did the following to create a canary deploy using OSM and Flagger. As recommended in #896, I used the MetricsTemplate CRDs to create the required Prometheus custom metrics (request-success-rate and request-duration).

    I then created a canary custom resource for podinfo deployment, however it does not succeed. It says that the canary custom resource cannot create a TrafficSplit resource for the canary deployment.

    Output excerpt of kubectl describe -f ./podinfo-canary.yaml:

    Status:
      Canary Weight:  0
      Conditions:
        Last Transition Time:  2021-06-07T22:28:21Z
        Last Update Time:      2021-06-07T22:28:21Z
        Message:               New Deployment detected, starting initialization.
        Reason:                Initializing
        Status:                Unknown
        Type:                  Promoted
      Failed Checks:           0
      Iterations:              0
      Last Transition Time:    2021-06-07T22:28:21Z
      Phase:                   Initializing
    Events:
      Type     Reason  Age                  From     Message
      ----     ------  ----                 ----     -------
      Warning  Synced  5m38s                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
      Normal   Synced  8s (x12 over 5m38s)  flagger  all the metrics providers are available!
      Warning  Synced  8s (x11 over 5m8s)   flagger  TrafficSplit podinfo.test create error: the server could not find the requested resource (post trafficsplits.split.smi-spec.io)
    


    To Reproduce

    ./kustomize/osm/kustomization.yaml:

    namespace: osm-system
    bases:
      - ../base/flagger/
    patchesStrategicMerge:
      - patch.yaml
    

    ./kustomize/osm/patch.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: flagger
    spec:
      template:
        spec:
          containers:
            - name: flagger
              args:
                - -log-level=info
                - -include-label-prefix=app.kubernetes.io
                - -mesh-provider=smi:v1alpha3
                - -metrics-server=http://osm-prometheus.osm-system.svc:7070
    
    ---
    
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRoleBinding
    metadata:
      name: flagger
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: flagger
    subjects:
      - kind: ServiceAccount
        name: flagger
        namespace: osm-system
    

    Used MetricTemplate CRD to implement required custom metric (recommended in #896) - request-success-rate.yaml:

    apiVersion: flagger.app/v1beta1
    kind: MetricTemplate
    metadata:
      name: request-success-rate
      namespace: osm-system
    spec:
      provider:
        type: prometheus
        address: http://osm-prometheus.osm-system.svc:7070
      query: |
        sum(
            rate(
                osm_request_total{
                  destination_namespace="{{ namespace }}",
                  destination_name="{{ target }}",
                  response_code!="404"
                }[{{ interval }}]
            )
        )
        /
        sum(
            rate(
                osm_request_total{
                  destination_namespace="{{ namespace }}",
                  destination_name="{{ target }}"
                }[{{ interval }}]
            )
        ) * 100
    

    Used MetricTemplate CRD to implement required custom metric (recommended in #896) - request-duration.yaml:

    apiVersion: flagger.app/v1beta1
    kind: MetricTemplate
    metadata:
      name: request-duration
      namespace: osm-system
    spec:
      provider:
        type: prometheus
        address: http://osm-prometheus.osm-system.svc:7070
      query: |
        histogram_quantile(
          0.99,
          sum(
            rate(
              osm_request_duration_ms{
                destination_namespace="{{ namespace }}",
                destination_name=~"{{ target }}"
              }[{{ interval }}]
            )
          ) by (le)
        )
    

    podinfo-canary.yaml:

    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: podinfo
      namespace: test
    spec:
      provider: "smi:v1alpha3"
      # deployment reference
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      # HPA reference (optional)
      autoscalerRef:
        apiVersion: autoscaling/v2beta2
        kind: HorizontalPodAutoscaler
        name: podinfo
      # the maximum time in seconds for the canary deployment
      # to make progress before it is rollback (default 600s)
      progressDeadlineSeconds: 60
      service:
        # ClusterIP port number
        port: 9898
        # container port number or name (optional)
        targetPort: 9898
      analysis:
        # schedule interval (default 60s)
        interval: 30s
        # max number of failed metric checks before rollback
        threshold: 5
        # max traffic percentage routed to canary
        # percentage (0-100)
        maxWeight: 50
        # canary increment step
        # percentage (0-100)
        stepWeight: 5
        # Prometheus checks
        metrics:
        - name: request-success-rate
          # minimum req success rate (non 5xx responses)
          # percentage (0-100)
          thresholdRange:
            min: 99
          interval: 1m
        - name: request-duration
          # maximum req duration P99
          # milliseconds
          thresholdRange:
            max: 500
          interval: 30s
        # testing (optional)
        webhooks:
          - name: acceptance-test
            type: pre-rollout
            url: http://flagger-loadtester.test/
            timeout: 30s
            metadata:
              type: bash
              cmd: "curl -sd 'test' http://podinfo-canary.test:9898/token | grep token"
          - name: load-test
            type: rollout
            url: http://flagger-loadtester.test/
            metadata:
              cmd: "hey -z 2m -q 10 -c 2 http://podinfo-canary.test:9898/"
    

    Output excerpt of kubectl describe -f ./podinfo-canary.yaml:

    Status:
      Canary Weight:  0
      Conditions:
        Last Transition Time:  2021-06-07T22:28:21Z
        Last Update Time:      2021-06-07T22:28:21Z
        Message:               New Deployment detected, starting initialization.
        Reason:                Initializing
        Status:                Unknown
        Type:                  Promoted
      Failed Checks:           0
      Iterations:              0
      Last Transition Time:    2021-06-07T22:28:21Z
      Phase:                   Initializing
    Events:
      Type     Reason  Age                  From     Message
      ----     ------  ----                 ----     -------
      Warning  Synced  5m38s                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
      Normal   Synced  8s (x12 over 5m38s)  flagger  all the metrics providers are available!
      Warning  Synced  8s (x11 over 5m8s)   flagger  TrafficSplit podinfo.test create error: the server could not find the requested resource (post trafficsplits.split.smi-spec.io)
    

    Full output of kubectl describe -f ./podinfo-canary.yaml: https://pastebin.ubuntu.com/p/kB9qtPxZvr/



    Expected behavior

    A clear and concise description of what you expected to happen.

    Additional context

    • Flagger version: 1.11.0
    • Kubernetes version: 1.19.11
    • Service Mesh provider: smi (through osm)
    • Ingress provider: N/A.
    opened by johnsonshi 13
  • Is there a FIPS compliant version of flagger ?

    Is there a FIPS compliant version of flagger ?

    Describe the feature

    Is there a FIPS compliant version of flagger ? Or a (supported) way to run Flagger in FIPS mode ?

    Proposed solution

    Any alternatives you've considered?

    Not really.

    opened by mihaimircea 0
  • Which hostname to use for load testing?

    Which hostname to use for load testing?

    The docs use:

          cmd: "hey -z 1m -q 10 -c 2 -m POST -d '{test: 2}' http://podinfo-canary.test:9898/echo"
    

    I'm confused by the use of the podinfo-canary.test, which references the Service. Isn't this bypassing the Ingress Controller, which is gathering the metrics?

    opened by cer 3
  • Raise the priority of rollback check

    Raise the priority of rollback check

    Describe the bug

    Some errors might happen in canary initialization, like primary or canary deploy not running because of image pull failure. Then we want to rollback to cancel canary initialization. But it does not work, because the rollback check is after the deployment ready check task. ref code: https://github.com/fluxcd/flagger/blob/main/pkg/controller/scheduler.go#L331

    To Reproduce

    Expected behavior

    To raise the priority of rollback check before deployment ready check. Or a better strategy?

    opened by imuxin 2
  • Support for Gloo route_options on Canary CR

    Support for Gloo route_options on Canary CR

    Describe the feature

    We would like to use Flagger with Gloo route tables, however the Canary resource does not support the ability to specify route specific options (prefixRewrite, jwt/auth disabling, header manipulation, etc). This is supported by Gloo on both the virtualservice and routetable resources: https://docs.solo.io/gloo-edge/latest/reference/api/github.com/solo-io/gloo/projects/gloo/api/v1/options.proto.sk/#routeoptions

    Proposed solution

    Allow for the specification of these options on the Canary CR, or develop the ability for Flagger to act on route tables that have already been created.

    Any alternatives you've considered?

    We are unable to specify the options on the virtual service because they differ depending on the route prefix in our system.

    kind/enhancement 
    opened by amall015 0
Releases(v1.22.2)
Owner
Flux project
Open and extensible continuous delivery solution for Kubernetes
Flux project
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers

Developer-oriented Continuous Delivery Product ⁣ English | 简体中文 Table of Contents Zadig Table of Contents What is Zadig Quick start How to use? How to

null 0 Oct 19, 2021
An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Grafana Complementary Operator A grafana which complements grafana-operator for custom features which are not feasible to be merged into core operator

Snapp Cab Incubators 6 Aug 16, 2022
kube-champ 42 Sep 22, 2022
Kubernetes Operator Samples using Go, the Operator SDK and OLM

Kubernetes Operator Patterns and Best Practises This project contains Kubernetes operator samples that demonstrate best practices how to develop opera

International Business Machines 20 Sep 13, 2022
The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Elastalert Operator for Kubernetes The Elastalert Operator is an implementation of a Kubernetes Operator. Getting started Firstly, learn How to use el

null 20 Jun 28, 2022
Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

Minecraft Operator A Kubernetes operator for dedicated servers of the video game

James Laverack 7 Jul 20, 2022
K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Kubernetes Network operator Will add more to the readme later :D Operations The

Daniel Hertzberg 6 May 16, 2022
Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Pulumi GitOps Example OpenGitOps Compliant Pulumi Kubernetes Operator Example Pr

Christian Hernandez 3 May 6, 2022
A kubernetes controller that watches the Deployments and “caches” the images

image-cloner This is just an exercise. It's a kubernetes controller that watches

Luca Sepe 1 Dec 20, 2021
Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install

Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install. The permissions are aggregated from the following sources:

International Business Machines 2 Apr 22, 2022
Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications

Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications Explore PipeCD docs » Overview PipeCD provides a unified co

PipeCD 613 Sep 23, 2022
Test Operator using operator-sdk 1.15

test-operator Test Operator using operator-sdk 1.15 operator-sdk init --domain rbt.com --repo github.com/ravitri/test-operator Writing kustomize manif

Ravi Trivedi 0 Dec 28, 2021
a k8s operator 、operator-sdk

helloworld-operator a k8s operator 、operator-sdk Operator 参考 https://jicki.cn/kubernetes-operator/ https://learnku.com/articles/60683 https://opensour

Mark YiL 0 Jan 27, 2022
Devtron is an open source software delivery workflow for kubernetes written in go.

Devtron is an open source software delivery workflow for kubernetes written in go.

Devtron Labs 2.4k Sep 26, 2022
The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Oracle 23 Sep 16, 2022
PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes.

GalaxyKube -- PolarDB-X Operator PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes. It follo

null 63 Sep 8, 2022
Kubernetes Operator to sync secrets between different secret backends and Kubernetes

Vals-Operator Here at Digitalis we love vals, it's a tool we use daily to keep secrets stored securely. We also use secrets-manager on the Kubernetes

digitalis.io 61 Sep 18, 2022
The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes.

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

konpyutaika 36 Sep 22, 2022
A controller managing namespaces deployments, statefulsets and cronjobs objects. Inspired by kube-downscaler.

kube-ns-suspender Kubernetes controller managing namespaces life cycle. kube-ns-suspender Goal Usage Internals The watcher The suspender Flags Resourc

Virtuo 55 Aug 10, 2022