Lightweight, CRD based envoy control plane for kubernetes

Overview

MARIN3R

Go Report Card codecov build license

Lighweight, CRD based Envoy control plane for Kubernetes:

  • Implemented as a Kubernetes Operator
  • Deploy and manage an Envoy xDS server using the DiscoveryService custom resource
  • Inject Envoy sidecar containers based on Pod annotations
  • Deploy Envoy as a Kubernetes Deployment using the EnvoyDeployment custom resource
  • Dynamic Envoy configuration using the EnvoyConfig custom resource
  • Use any secret of type kubernetes.io/tls as a certificate source
  • Syntactic validation of Envoy configurations
  • Self-healing
  • Controls Envoy connnection draining and graceful shutdown whenever pods are terminated

Table of Contents

Overview

MARIN3R is a Kubernetes operator to manage a fleet of Envoy proxies within a Kubernetes cluster. It takes care of the deployment of the proxies and manages their configuration, feeding it to them through a discovery service using Envoy's xDS protocol. This allows for dynamic reconfiguration of the proxies without any reloads or restarts, favoring the ability to perform configuration changes in a non-disruptive way.

Users can write their Envoy configurations by making use of Kubernetes Custom Resources that the operator will watch and make available to the proxies through the discovery service. Configurations are defined making direct use of Envoy's v2/v3 APIs so anything supported in the Envoy APIs is available in MARIN3R. See the configuration section or the API reference for more details.

A great way to use this project is to have your own operator generating the Envoy configurations that your platform/service requires by making use of MARIN3R APIs. This way you can just focus on developing the Envoy configurations you need and let MARIN3R take care of the rest.

Getting started

Installation

MARIN3R can be installed either by using kustomize or by using Operator Lifecycle Manager (OLM). We recommend using OLM installation whenever possible.

Install using OLM

OLM is installed by default in Openshift 4.x clusters. For any other Kubernetes flavor, check if it is already installed in your cluster. If not, you can easily do so by following the OLM install guide.

Once OLM is installed in your cluster, you can proceed with the operator installation by applying the install manifests. This will create a namespaced install of MARIN3R that will only watch for resources in the default namespace, with the operator deployed in the marin3r-system namespace. Modify the field spec.targetNamespaces of the OperatorGroup resource in examples/quickstart/olm-install.yaml to modify the namespaces that MARIN3R will watch. A cluster scoped installation through OLM is currently not supported (check the kustomize based installation for cluster scope installation of the operator).

kubectl apply -f examples/quickstart/olm-install.yaml

Wait until you see the following Pods running:

▶ kubectl -n marin3r-system get pods | grep Running
marin3r-catalog-qsx9t                                             1/1     Running     0          103s
marin3r-controller-manager-5f97f86fc5-qbp6d                       2/2     Running     0          42s
marin3r-controller-webhook-5d4d855859-67zr6                       1/1     Running     0          42s
marin3r-controller-webhook-5d4d855859-6972h                       1/1     Running     0          42s

Install using kustomize

This method will install MARIN3R with cluster scope permissions in your cluster. It requires cert-manager to be present in the cluster.

To install cert-manager you can execute the following command in the root directory of this repository:

make deploy-cert-manager

You can also refer to the cert-manager install documentation.

Once cert-manager is available in the cluster, you can install MARIN3R by issuing the following command:

kustomize build config/default | kubectl apply -f -

After a while you should see the following Pods running:

▶ kubectl -n marin3r-system get pods
NAME                                          READY   STATUS    RESTARTS   AGE
marin3r-controller-manager-6c45f7675f-cs6dq   2/2     Running   0          31s
marin3r-controller-webhook-684bf5bbfd-cp2x4   1/1     Running   0          31s
marin3r-controller-webhook-684bf5bbfd-zdvrk   1/1     Running   0          31s

Deploy a discovery service

A discovery service is a Pod that users need to deploy in their namespaces to provide such namespaces with the ability to configure Envoy proxies dynamically using configurations loaded from Kubernetes Custom Resources. This Pod runs a couple of Kubernetes controllers as well as an Envoy xDS server. To deploy a discovery service users make use of the DiscoveryService custom resource that MARIN3R provides. The DiscoveryService is a namespace scoped resource, so one is required for each namespace where Envoy proxies are going to be deployed.

Continuing with our example, we are going to deploy a DiscoveryService resource in the default namespace of our cluster:

cat <<'EOF' | kubectl apply -f -
apiVersion: operator.marin3r.3scale.net/v1alpha1
kind: DiscoveryService
metadata:
  name: discoveryservice
  namespace: default
EOF

After a while you should see the discovery service Pod running:

▶ kubectl -n default get pods
NAME                                READY   STATUS    RESTARTS   AGE
marin3r-discoveryservice-676b5cd7db-xk9rt   1/1     Running   0          4s

Next steps

After installing the operator and deploying a DiscoveryService into a namespace, you are ready to start deploying and configuring Envoy proxies within the namespace. You can review the different walkthroughs within this repo to learn more about MARIN3R and its capabilities.

Configuration

API reference

The full MARIN3R API reference can be found here

EnvoyConfig custom resource

MARIN3R most core functionality is to feed the Envoy configurations defined in EnvoyConfig custom resources to an Envoy discovery service. The discovery service then sends the resources contained in those configurations to the Envoy proxies that identify themselves with the same nodeID defined in the EnvoyConfig resource.

Commented example of an EnvoyConfig resource:

cat <<'EOF' | kubectl apply -f -
apiVersion: marin3r.3scale.net/v1alpha1
kind: EnvoyConfig
metadata:
  # name and namespace uniquelly identify an EnvoyConfig but are
  # not relevant in any other way
  name: config
spec:
  # nodeID indicates that the resources defined in this EnvoyConfig are relevant
  # to Envoy proxies that identify themselves to the discovery service with the same
  # nodeID. The nodeID of an Envoy proxy can be specified using the "--node-id" command
  # line flag
  nodeID: proxy
  # Resources can be written either in json or in yaml, being json the default if
  # not specified
  serialization: json
  # Resources can be written using either v2 Envoy API or v3 Envoy API. Mixing v2 and v3 resources
  # in the same EnvoyConfig is not allowed. Default is v2.
  envoyAPI: v3
  # envoyResources is where users can write the different type of resources supported by MARIN3R
  envoyResources:
    # the "secrets" field holds references to Kubernetes Secrets. Only Secrets of type
    # "kubernetes.io/tls" can be referenced. Any certificate referenced from another Envoy
    # resource (for example a listener or a cluster) needs to be present here so marin3r
    # knows where to get the certificate from.
    secrets:
        # name is the name of the kubernetes Secret that holds the certificate and by which it can be 
        # referenced to from other resources
      - name: certificate
    # Endpoints is a list of the Envoy ClusterLoadAssignment resource type.
    # V2 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/endpoint.proto
    # V3 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint.proto
    endpoints:
      - name: endpoint1
        value: {"clusterName":"cluster1","endpoints":[{"lbEndpoints":[{"endpoint":{"address":{"socketAddress":{"address":"127.0.0.1","portValue":8080}}}}]}]}
    # Clusters is a list of the Envoy Cluster resource type.
    # V2 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cluster.proto
    # V3 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto
    clusters:
      - name: cluster1
        value: {"name":"cluster1","type":"STRICT_DNS","connectTimeout":"2s","loadAssignment":{"clusterName":"cluster1","endpoints":[]}}
    # Routes is a list of the Envoy Route resource type.
    # V2 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route.proto
    # V3 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route.proto
    routes:
      - name: route1
        value: {"name":"route1","virtual_hosts":[{"name":"vhost","domains":["*"],"routes":[{"match":{"prefix":"/"},"direct_response":{"status":200}}]}]}
    # Listeners is a list of the Envoy Listener resource type.
    # V2 referece: https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/listener.proto
    # V3 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/listener/v3/listener.proto
    listeners:
      - name: listener1
        value: {"name":"listener1","address":{"socketAddress":{"address":"0.0.0.0","portValue":8443}}}
    # Runtimes is a list of the Envoy Runtime resource type.
    # V2 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v2/service/discovery/v2/rtds.proto
    # V3 reference: https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/runtime/v3/rtds.proto
    runtimes:
      - name: runtime1
        value: {"name":"runtime1","layer":{"static_layer_0":"value"}}

Secrets

Secrets are treated in a special way by MARIN3R as they contain sensitive information. Instead of directly declaring an Envoy API secret resource in the EnvoyConfig CR, you have to reference a Kubernetes Secret, which should exists in the same namespace. MARIN3R expects this Secret to be of type kubernetes.io/tls and will load it into an Envoy secret resource. This way you avoid having to insert sensitive data into the EnvoyConfig resources and allows you to use your regular kubernetes Secret management workflow for sensitive data.

Other approach that can be used is to create certificates using cert-manager because cert-manager also uses kubernetes.io/tls Secrets to store the certificates it generates. You just need to point the references in your EnvoyConfig to the proper cert-manager generated Secret.

To use a certificate from a kubernetes Secret refer it like this from an EnvoyConfig:

spec:
  envoyResources:
    secrets:
      - name: certificate

This certificate can then be referenced in an Envoy cluster/listener with the following snippet (check the kuard example):

transport_socket:
  name: envoy.transport_sockets.tls
  typed_config:
    "@type": "type.googleapis.com/envoy.api.v2.auth.DownstreamTlsContext"
    common_tls_context:
      tls_certificate_sds_secret_configs:
        - name: certificate
          sds_config:
            ads: {}

Sidecar injection configuration

The MARIN3R mutating admission webhook will inject Envoy containers in any Pod annotated with marin3r.3scale.net/node-id and labelled with marin3r.3scale.net/status=enabled. The following annotations can be used in Pods to control the behavior of the sidecar injection:

annotations description default value
marin3r.3scale.net/node-id Envoy's node-id N/A
marin3r.3scale.net/cluster-id Envoy's cluster-id same as node-id
marin3r.3scale.net/envoy-api-version Envoy's API version (v2/v3) v2
marin3r.3scale.net/container-name the name of the Envoy sidecar envoy-sidecar
marin3r.3scale.net/ports the exposed ports in the Envoy sidecar N/A
marin3r.3scale.net/host-port-mappings Envoy sidecar ports that will be mapped to the host. This is used for local development, no recommended for production use. N/A
marin3r.3scale.net/envoy-image the Envoy image to be used in the injected sidecar container envoyproxy/envoy:v1.14.1
marin3r.3scale.net/config-volume the Pod volume where the ads-configmap will be mounted envoy-sidecar-bootstrap
marin3r.3scale.net/tls-volume the Pod volume where the marin3r client certificate will be mounted. envoy-sidecar-tls
marin3r.3scale.net/client-certificate the marin3r client certificate to use to authenticate to the marin3r control plane (marin3r uses mTLS)) envoy-sidecar-client-cert
marin3r.3scale.net/envoy-extra-args extra command line arguments to pass to the Envoy sidecar container ""
marin3r.3scale.net/admin.port Envoy's admin port 9901
marin3r.3scale.net/resources.limits.cpu Envoy sidecar container resource cpu limits. See syntax format to specify the resource quantity N/A
marin3r.3scale.net/admin.port Envoy's admin api port 9901
marin3r.3scale.net/admin.bind-address Envoy's admin api bind address 0.0.0.0
marin3r.3scale.net/admin.access-log-path Envoy's admin api access logs path /dev/null
marin3r.3scale.net/resources.limits.memory Envoy sidecar container resource memory limits. See syntax format to specify the resource quantity N/A
marin3r.3scale.net/resources.requests.cpu Envoy sidecar container resource cpu requests. See syntax format to specify the resource quantity N/A
marin3r.3scale.net/resources.requests.memory Envoy sidecar container resource memory requests. See syntax format to specify the resource quantity N/A
marin3r.3scale.net/shutdown-manager.enabled Enable or disables Envoy shutdown manager for graceful shutdown of the Envoy server (true/false) false
marin3r.3scale.net/shutdown-manager.port Envoy's shutdown manager server port 8090
marin3r.3scale.net/shutdown-manager.image Envoy's shutdown manager image If unset, the operator will select the appropriate image
marin3r.3scale.net/init-manager.image Envoy's init manager image If unset, the operator will select the appropriate image
marin3r.3scale.net/shutdown-manager.extra-lifecycle-hooks Comma separated list of container names whose stop should be coordinated with the shutdown-manager. You usually would want to add containers that act as upstream clusters for the Envoy sidecar N/A

marin3r.3scale.net/ports syntax

The port syntax is a comma-separated list of name:port[:protocol] as in "envoy-http:1080,envoy-https:1443".

marin3r.3scale.net/host-port-mappings syntax

The host-port-mappings syntax is a comma-separated list of container-port-name:host-port-number as in "envoy-http:1080,envoy-https:1443".

Use cases

Ratelimit

Design docs

For an in-depth look at how MARIN3R works, check the design docs.

Discovery service

Sidecar injection

Operator

Development

You can find development documentation here.

Release

You can find release process documentation here.

Issues
  • Remove API v2 code and EnvoyBootstrap code

    Remove API v2 code and EnvoyBootstrap code

    This is a code cleanup PR:

    • Removes all the code related to Envoy V2 API. V3 was already the default API so this is the natural step in the deprecation of V2, which has already been removed from https://github.com/envoyproxy/go-control-plane, a dependency of this project.
    • Removes the EnvoyBootstrap controller, whose use was deprecated in 0.8.
    • Moves the discovery service command code unde cmd/ so the structure of the code is the same for all subcomands.

    /kind cleanup /priority important-soon /assign

    ok-to-test lgtm approved size/XL priority/important-soon kind/cleanup 
    opened by roivaz 12
  • feat/autogenerate-proto-imports

    feat/autogenerate-proto-imports

    Context: There is a list of imports in pkg/envoy/serializer/v3/serializer.go that is required so the serialization/deserialization code is able to handle proto messages of the Any type. This is a dynamic list of imports as the files containing protobuffer definitions in go-control-plane can change from version to version. So far this list of imports was manually maintained, which is problematic as the list could end up being out of date.

    This PR automates the process of generating the list of imports:

    • The list is now maintained in a separate package pkg/envoy/protos/v3 that can be imported from other packages.
    • A generator has been written that performs the following tasks:
      • Inspects the project's go.mod to determine the go-control-plane release in use.
      • Clones the specific tag of the go-control-plane repository into memory and looks for the .pb.go files that belong to the v3 api version (though the generator is already able to look for other api versions).
      • Generates the file with all the imports within the pkg/envoy/protos/v3 package.
    • go generate is used to trigger the execution of the code generator from the Makefile when the binary is built.

    Currently the list of imports is out of date in marin3r-v0.9.0 so a new patch release will be required after this PR as some proto message definitions are missing, resulting in an error if a user tries to use them.

    /kind feature /kind bug /priority important-soon /assign

    ok-to-test lgtm approved kind/feature priority/important-soon size/L kind/bug 
    opened by roivaz 11
  • feat/upgrade-deps

    feat/upgrade-deps

    This PR upgrades libs and operator-sdk to latest possible ones, within constrains. Operator SDK manifest generation has been refactored for better maintenance, making it clear where the manifests have been customized within the Kustomize resources. Most of the operator-sdk project scaffolding has been regenerated using the latest version.

    Operator SDK has been bumped just to 1.10 as there is an ongoing issue for higher versions, still unresolved even though the issue that was reported is already closed: https://github.com/operator-framework/operator-sdk/issues/5244

    /kind feature /priority important-soon /assign

    ok-to-test lgtm approved kind/feature size/XL priority/important-soon 
    opened by roivaz 11
  • feat/default-v3

    feat/default-v3

    Given that latest releases of envoy have dropped support for the v2 api and that envoy 1.18.3 is currently MARIN3R's default, set v3 as the default API to use.

    I also deleted some unused code that was still around.

    /kind feature /priority important-soon /assign

    ok-to-test lgtm approved kind/feature size/M priority/important-soon 
    opened by roivaz 10
  • Add a name to the V3 file-based SDS tls secret response

    Add a name to the V3 file-based SDS tls secret response

    This applies only to the V3 protocol.

    This impacts the tls_certificate_sds_secret.json config file, not the response sent over the wire to xDS clients.

    Marin3r generates this file and adds it to a k8s secret which is mounted into Envoy pods. Envoy uses this file to retrieve the bootstrap certs that allow it to talk to Marin3r over TLS. If there's no "name" field then Envoy v0.17.0 can't process the cert, although older Envoy versions can.

    [2021-02-18 23:20:51.419][1][critical][main] [source/server/server.cc:109] error initializing configuration '/etc/envoy/bootstrap/config.json': Proto constraint validation failed (UpstreamTlsContextValidationError.CommonTlsContext: ["embe dded message failed validation"] | caused by CommonTlsContextValidationError.TlsCertificateSdsSecretConfigs[i]: ["embedded message failed validation"] | caused by SdsSecretConfigValidationError.Name: ["value length must be at least " '\x0 1' " runes"]): common_tls_context { tls_certificate_sds_secret_configs { sds_config { path: "/etc/envoy/bootstrap/tls_certificate_sds_secret.json" } } }

    [2021-02-18 23:20:51.419][1][info][main] [source/server/server.cc:782] exiting Proto constraint validation failed (UpstreamTlsContextValidationError.CommonTlsContext: ["embedded message failed validation"] | caused by CommonTlsContextValidationError.TlsCertificateSdsSecretConfigs[i]: ["embedded message failed valida tion"] | caused by SdsSecretConfigValidationError.Name: ["value length must be at least " '\x01' " runes"]): common_tls_context { tls_certificate_sds_secret_configs { sds_config { path: "/etc/envoy/bootstrap/tls_certificate_sds_secret.json" } } }

    Thank you @roivaz for pointing me to this fix!

    ok-to-test 
    opened by acnodal-tc 10
  • Rename field 'podAffinity' to just 'affinity' in EnvoyDeployment resource

    Rename field 'podAffinity' to just 'affinity' in EnvoyDeployment resource

    'podAffinity' was a poor choice for naming the field because it's basically the affinity field of a Pod spec. It's best if the naming is the same. This field has not make it yet to a stable release, so there is no problem in changing it.

    /kind feature /priority important-soon /assign

    ok-to-test lgtm approved kind/feature size/M priority/important-soon 
    opened by roivaz 9
  • Reimplement self-healing using internal statistics

    Reimplement self-healing using internal statistics

    This PR includes the following:

    • Fixed a bug affecting reconcile of EnvoyConfigRevision status: d7687ea600d0293bb62e1146cd195e3e5d631f87
    • Implemented a mechanism to internally store statistics related to the xDS protocol messages interchanged between clients and the discovery service: 24e5b59e4636612f11adacf9b7ed5a471d2cc9af
    • Reimplemented the self-healing using the internal xDS stats: 1b17645cd666ac48be2d399189b56fe483b9e8d4
    • Implemented a backoff algorithm to avoid overloading the envoy clients with retries from the discovery service: 86d1b6dc8ff8aa8f3d19040cebf757f177a37269

    /kind feature /priority important-soon /assign

    ok-to-test lgtm approved kind/feature size/XL priority/important-soon 
    opened by roivaz 9
  • feat/shutdown-manager

    feat/shutdown-manager

    This PR add graceful termination for envoy containers, with connection draining of listeners.

    The shutdown manager can be enabled for EnvoyDeployment resources using:

    spec:
      shutdownManager: {}
    

    The shutdown manager can be enabled for envoy injected sidecars using the following annotation in Pods:

    metadata:
      annotations:
        marin3r.3scale.net/shutdown-manager.enabled: "true"
    

    The shutdown manager is a new command in the Marin3r image that runs a small server and is deployed as a sidecar container to the Envoy container. Container lifecycle hooks are used in both the shutdown manager container and the Envoy container to orchestrate graceful shutdown of the Envoy server, waiting until all the listeners are drained or the 300s timeout is reached.

    An example of how the Envoy and the shutdown manager containers are configured:

      containers:
        - name: envoy 
          args:
            - '-c'
            - /etc/envoy/bootstrap/config.json
            - '--service-node'
            - example
            - '--service-cluster'
            - example
            - '--component-log-level'
            - 'config:debug'
          command:
            - envoy
          image: 'envoyproxy/envoy:v1.16.0'
          lifecycle:
            preStop:
              httpGet:
                path: /shutdown
                port: 8090
                scheme: HTTP
          # rest of the container config is omitted
        - name: envoy-shtdn-mgr 
          args:
            - shutdown-manager
            - '--port'
            - '8090'
          image: 'quay.io/3scale/marin3r:v0.8.0-alpha.8'
          lifecycle:
            preStop:
              httpGet:
                path: /drain
                port: 8090
                scheme: HTTP
          # rest of the container config is omitted
    

    /kind feature /priority important-soon

    ok-to-test lgtm approved kind/feature size/XL priority/important-soon 
    opened by roivaz 9
  • Release/v0.10.0

    Release/v0.10.0

    Release v0.10.0.

    A small fix has been added to the package generators to wipe the generated file contents before writing.

    /kind feature /priority important-soon /assign

    ok-to-test lgtm approved kind/feature size/M priority/important-soon 
    opened by roivaz 8
  • feat/extra-container-lifecycle-hooks

    feat/extra-container-lifecycle-hooks

    This PR allows coordination for extra containers within the Pod with the shutdown manager. A new annotation has been added that allows a user to specify other container names where the shutdown manager lifecycle hook should also be configured. This is a feature that only makes sense for sidecars.

    An example of usage:

    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kuard
      namespace: default
      labels:
        app: kuard
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: kuard
      template:
        metadata:
          labels:
            app: kuard
            marin3r.3scale.net/status: "enabled"
          annotations:
            marin3r.3scale.net/node-id: kuard
            marin3r.3scale.net/ports: envoy-https:8443
            marin3r.3scale.net/shutdown-manager.enabled: "true"
            marin3r.3scale.net/shutdown-manager.extra-lifecycle-hooks: kuard
        spec:
          containers:
            - name: kuard
              image: gcr.io/kuar-demo/kuard-amd64:blue
              ports:
                - containerPort: 8080
                  name: http
                  protocol: TCP
    

    /kind feature /priority importantt-soon /assign

    ok-to-test lgtm approved kind/feature size/M needs-priority 
    opened by roivaz 8
  • Correctly sort ECRs when there are only two in the list

    Correctly sort ECRs when there are only two in the list

    SortByPublication had a bug where it was checking only one of the candidates' versions but not the other. If there are only two candidates to be sorted then one's version wasn't being examined so the sort ended up being based on publish date only, ignoring the version.

    Now we check both candidates' versions so version is always the primary sort key, even when there are only two candidates.

    ok-to-test lgtm approved needs-priority kind/bug size/S 
    opened by acnodal-tc 7
  • EnvoyConfig InSync but pods having different TLS certificates

    EnvoyConfig InSync but pods having different TLS certificates

    What happened

    In staging environment, we had intermittent alerts of HTTP certificate expiration date of monitored VIP HTTP endpoints with blackblox_exporter:

            - alert: ProbeSSLCertExpire
              expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 14
              for: 5m
              labels:
                severity: critical
              annotations:
                message: "SSL certificate from probe target {{ $labels.target }} is going to expire in 14 days"
    

    It means, that the certificate exposed by the envoy sidecar was different among different pods belonging to the same deployment.

    The EnvoyConfig related to all those endpoints was in InSync, DESIRED VERSION was the same as PUBLISHED VERSION:

    $ oc get envoyconfig
    NAME                            NODE ID                         ENVOY API   DESIRED VERSION   PUBLISHED VERSION   CACHE STATE
    mt-ingress                      mt-ingress                      v3          b98554f7b         b98554f7b           InSync
    stg-saas-apicast-production     stg-saas-apicast-production     v3          6669946454        6669946454          InSync
    stg-saas-apicast-staging        stg-saas-apicast-staging        v3          7bcb8576c8        7bcb8576c8          InSync
    stg-saas-backend-listener       stg-saas-backend-listener       v3          746b79f77b        746b79f77b          InSync
    stg-saas-echo-api               stg-saas-echo-api               v3          68f954f6c9        68f954f6c9          InSync
    

    echo-api, backend, mt-ingress and apicast-production had 50% of pods (1 pod out of 2) with the oldest TLS certificate, so the only correct was apicast-staging.

    AFAIK, when cert-manager updates a certificate in a k8s Secret, the marin3r ServiceDiscovery should update every sidecar via xDS API, and receive a NACK from every pod. If I recall well, in the past a single pod NACK was enough to mark the EnvoyConfig as InSync, but lately it was changed to a percentage https://github.com/3scale-ops/marin3r/commit/1b17645cd666ac48be2d399189b56fe483b9e8d4

    However on this case, 50% of pods were not updated, and its EnvoyConfig was in InSync

    How we checked the certificate from every pod

    We checked the oldest certificate doing a port-forwarding of the envoy admin port

    $ oc port-forward backend-listener-67d4786b64-lx682 9901:9901
    Forwarding from 127.0.0.1:9901 -> 9901
    Forwarding from [::1]:9901 -> 9901
    

    And accessing to the cert page, where we could check the cert expiration date at http://127.0.0.1:9901/certs

    When we detect a pod out-of-sync, we delete the pod, a new pod is created containing the correct cert (like the one on the other deployment pod, as it happen to 1 pod out of 2 pods per deployment).

    In production environment this issue is not happening.

    Workaround to be alerted

    For the moment, we have added an alert to be alerted when there is a certificate expiration drifts for the same HTTP target (dtected changes over time with a rate function, checked in staging environment):

            - alert: ProbeSSLCertExpireDrift
              expr: rate(probe_ssl_earliest_cert_expiry[5m]) > 0
              for: 10m
              labels:
                severity: critical
              annotations:
                message: "SSL certificate from probe target {{ $labels.target }} is showing different certificate expiration dates, maybe some pods are not loading a renewed certificate"
    

    How to reproduce

    We haven't seen any error on ServiceDiscovery, so unfortunately we are unaware to reproduce it :(

    needs-size needs-priority kind/bug 
    opened by slopezz 3
  • Marin3r endpoint auto-discovery

    Marin3r endpoint auto-discovery

    Why

    To use kubernetes endpoints instead of services to improve load balancing decisions.

    /kind feature /priority important-longterm /label size/xl /assign

    kind/feature size/XL priority/important-longterm 
    opened by raelga 0
  • Add liveness/readiness probes for the discovery service

    Add liveness/readiness probes for the discovery service

    Similarly to #45 we need to add readiness/liveness to the discovery service Deployment. In this case is not sufficient with the endpoints provided by controller-runtime as we also need to assess the health of the discovery service server and somehow aggregate both results in the same endpoint.

    kind/feature 
    opened by roivaz 0
  • HA for the discovery service server

    HA for the discovery service server

    Right now the discovery service server runs in a single pod. This is not optimal as if new pods are created while the discovery service pod in down, they will fail. Already running pods are not affected though.

    The proposal would be to move the EnvoyConfig controller to the operator pod and leave just the EnvoyConfigRevision discovery service pod. This has some problems that would need to be solved:

    • The status would need to have more intelligence as we need to assess that all discovery service pods have synced their cache before declaring an EnvoyConfig cacheStatus as "InSync".
    kind/feature 
    opened by roivaz 0
  • Operator to manage DiscoveryService certificates

    Operator to manage DiscoveryService certificates

    A solution is needed to manage the renewal of the DiscoveryService related certificates:

    • The CA
    • The server certificate
    • The client certificates

    Currently all these certificates are just created but never reconciled/renewed so manual action is required to renew them.

    kind/feature 
    opened by roivaz 1
  • Check snapshot consistency before writing to the xDS cache in the EnvoyConfigRevision controller

    Check snapshot consistency before writing to the xDS cache in the EnvoyConfigRevision controller

    Check snapshot consistency using "snap.Consistent()" function provided by the cache implementation of go-control-plane. Consider also validating consistency between clusters and listeners, as this is not done by snap.Consistent() because listeners and clusters are not requested by name by the envoy gateways.

    Relevant file: https://github.com/3scale/marin3r/blob/master/pkg/controller/envoyconfigrevision/envoyconfigrevision_controller.go

    kind/feature 
    opened by roivaz 1
Releases(v0.10.0)
  • v0.10.0(Jan 26, 2022)

    Breaking changes

    • Envoy 1.20.1 is now the default #132. Envoy version must be explicitly set to avoid upgrade to the default one upon upgrade of the operator. @roivaz
    • The Protocol Buffers implementation has been migrated from github.com/golang/protobuf, which is deprecated, to google.golang.org/protobuf (#134). This introduces changes in how Envoy resources are serialized/deserialized to/from json/yaml and configurations that worked with older versions of MARIN3R might not work after the upgrade. One such difference has been detected with Protocol Buffer durations, which are now only accepted in seconds (s) or nanoseconds (ns). It is recommended that all configurations are tested in a non-production environment before upgrading to MARIN3R v0.10.0. @roivaz

    New features

    • Support for Envoy scoped routes has been added (#135). @roivaz
    • Connection draining time and strategy are now configurable in the Shutdown Manager, both for sidecars and EnvoyDeployments (#133). @roivaz

    Other changes

    • Use the built image as the default for the Init Manager, Shutdown Manager and Discovery Service (#130) @acnodal-tc .
    • Reduce the size of the docker build context (#131) @acnodal-tc .
    • Add support to override the default image through an environment variable in the operator Pod (#132). @roivaz
    • Upgraded project dependencies (#132) @roivaz
    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Nov 4, 2021)

  • v0.9.0(Oct 21, 2021)

    Breaking changes

    • Envoy 1.20.0 is now the default #120. You must explicitly set the Envoy version if you want to avoid upgrading or use a different version.
    • Envoy configuration API v2 has been deprecated in favor of v3 #118. Users still using v2 configuration API must perform the following steps before upgrading:
      • Migrate all EnvoyConfigs still using v2 config API to v3. The process of migration is described here.
      • Delete any remaining v2 EnvoyConfigRevision. You can list them using kubectl get envoyconfigrevisions -A -l marin3r.3scale.net/envoy-api=v2.

    Other changes

    • Upgraded operator-sdk and project dependencies #119.
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jul 30, 2021)

    Breaking changes

    • Envoy v3 config API is now the default #112. If you are using v2 configurations you must either upgrade them to v3 before upgrading the operator or explicitly set v2 in both your EnvoyConfigs and Pod annotations to continue using v2.
    • Envoy 1.18.3 is now the default #105. You must explicitly set the Envoy version if you want to avoid upgrading or use a different version. Note that the last Envoy release supporting the v2 config API is v1.16.

    New features

    • Added new resource EnvoyDeployment to deploy Envoy as a kubernetes Deployment #89 #92.
    • Added validation of EnvoyConfig resources #79.
    • Added the shutdown manager to handle connection draining upon termination of Envoy containers #95 #111 #113.
    • The Envoy static configuration required to provide the initial configuration for Envoy to talk to the discovery service is now generated in an init container. This init container also adds Pod metadata that can then be interpreted by the discovery service. The usage of a shared ConfigMap to do this has been deprecated. This change is transparent to the user and no changes are required. #100 #110 .
    • Each Envoy resource type is now internally versioned separately by the discovery service. This improves performance as reduces the number of resource changes that the discovery service sends to the Envoy clients. #101 .
    • Refactored the self-healing capabilities using Pod metadata that is now available to the controllers to make the self-healing much more robust #102.

    Bugfixing and minor improvements

    • Field spec.envoyResources.secrets.[*].ref is now optional in EnvoyConfig resources #61.
    • Fix some panics coming from operator-utils library #93 #108 operator-utils#69 operator-utils#68.
    • Move subcommands into a different package #97.
    • Fix webhook in multi-namespaced install mode #103.
    • Fix the file-based SDS tls secret response used in Envoy's static configuration in Envoy v1.17+ #104 (thanks @acnodal-tc ).

    Other changes

    • Upgraded the project to use kubebuilder v3 #78.
    • Upgraded project dependencies #80.
    • CI moved to Github Actions #86.
    • Documentation updates.
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Jan 22, 2021)

    Changelog

    • The discovery service supports now both v2 and v3 envoy API versions (https://github.com/3scale/marin3r/pull/48)
    • Non disruptive upgrade of EnvoyConfigs from v2 to v3 and viceversa (https://github.com/3scale/marin3r/pull/56, https://github.com/3scale/marin3r/pull/57)
    • DiscoveryService is now a namespaced resource to allow for namespaced installation of the operator (https://github.com/3scale/marin3r/pull/60)
    • Code improvements to all controllers, with improved test coverage (https://github.com/3scale/marin3r/pull/64, https://github.com/3scale/marin3r/pull/66, https://github.com/3scale/marin3r/pull/67, https://github.com/3scale/marin3r/pull/68, https://github.com/3scale/marin3r/pull/70)
    • Fix a bug where an EnvoyConfigRevision could get tainted if a referred Secret was temporarily unavailable (https://github.com/3scale/marin3r/pull/65)
    • Added design and developement docs. See https://github.com/3scale/marin3r/tree/master/docs

    Image

    quay.io/3scale/marin3r:v0.7.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jun 9, 2020)

    Changelog

    • Integrate project with operator-sdk
    • Cache is now implemented as several CRDs and controllers that are in charge of:
      • Keeping the control-plane cache in sync with the resources spec in the CRD
      • Reporting the state to the user
      • Self-healing when gateway failures are detected

    Image

    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Apr 17, 2020)

  • v0.1.3(Apr 16, 2020)

  • v0.1.2(Apr 16, 2020)

  • v0.1.1(Apr 15, 2020)

  • v0.1.0(Apr 14, 2020)

Working towards a control plane for the MiCo Tool and the MiCoProxy

A simple control plane for MiCo This is still largely a work in progress The overall idea is to build a kubernetes DaemonSet that watches kubernetes s

null 0 May 4, 2022
L3AFD kernel function control plane

L3AFD: Lightweight eBPF Application Foundation Daemon L3AFD is a crucial part of the L3AF ecosystem. For more information on L3AF see https://l3af.io/

L3AF 112 Aug 8, 2022
⚡️ Control plane management agent for FD.io's VPP

VPP Agent The VPP Agent is a Go implementation of a control/management plane for VPP based cloud-native Virtual Network Functions (VNFs). The VPP Agen

EMnify 0 Aug 3, 2020
Helm Operator is designed to managed the full lifecycle of Helm charts with Kubernetes CRD resource.

Helm Operator Helm Operator is designed to install and manage Helm charts with Kubernetes CRD resource. Helm Operator does not create the Helm release

Chen Zhiwei 5 Jul 9, 2022
Envoy file based dynamic routing using kubernetes config map

Envoy File Based Dynamic Routing Config mapを使用してEnvoy File Based Dynamic Routingを実現します。 概要 アーキテクチャとしては、 +----------+ +--------------+ +-----------

null 1 Mar 1, 2022
Ejemplo de un k8s custom controller para un CRD nuevo

Clonado de kubernetes/sample-controller Para pruebas de un CRD nuevo This repository implements a simple controller for watching Foo resources as defi

Cloud & BigData Solutions 0 Nov 3, 2021
VaultOperator provides a CRD to interact securely and indirectly with secrets stored in Hashicorp Vault.

vault-operator The vault-operator provides several CRDs to interact securely and indirectly with secrets. Details Currently only stage 1 is implemente

finleap connect 3 Mar 12, 2022
Sesame: an Ingress controller for Kubernetes that works by deploying the Envoy proxy as a reverse proxy and load balancer

Sesame Overview Sesame is an Ingress controller for Kubernetes that works by dep

Sesame 1 Dec 28, 2021
Kubernetes OS Server - Kubernetes Extension API server exposing OS configuration like sysctl via Kubernetes API

KOSS is a Extension API Server which exposes OS properties and functionality using Kubernetes API, so it can be accessed using e.g. kubectl. At the moment this is highly experimental and only managing sysctl is supported. To make things actually usable, you must run KOSS binary as root on the machine you will be managing.

Mateusz Gozdek 3 May 19, 2021
Asynchronously control the different roles available in the kubernetes cluster

RBAC audit Introduction This tool allows you to asynchronously control the different roles available in the kubernetes cluster. These audits are enter

null 0 Oct 19, 2021
KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Kubernetes-based Event Driven Autoscaling KEDA allows for fine-grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KED

KEDA 5.3k Aug 11, 2022
An Easy to use Go framework for Kubernetes based on kubernetes/client-go

k8devel An Easy to use Go framework for Kubernetes based on kubernetes/client-go, see examples dir for a quick start. How to test it ? Download the mo

null 10 Mar 25, 2022
Lightweight Kubernetes

K3s - Lightweight Kubernetes Lightweight Kubernetes. Production ready, easy to install, half the memory, all in a binary less than 100 MB. Great for:

null 20.7k Aug 13, 2022
Cmsnr - cmsnr (pronounced "commissioner") is a lightweight framework for running OPA in a sidecar alongside your applications in Kubernetes.

cmsnr Description cmsnr (pronounced "commissioner") is a lightweight framework for running OPA in a sidecar alongside your applications in Kubernetes.

John Hooks 4 Jan 13, 2022
K8s-ingress-health-bot - A K8s Ingress Health Bot is a lightweight application to check the health of the ingress endpoints for a given kubernetes namespace.

k8s-ingress-health-bot A K8s Ingress Health Bot is a lightweight application to check the health of qualified ingress endpoints for a given kubernetes

Aaron Tam 0 Jan 2, 2022
Litmus helps Kubernetes SREs and developers practice chaos engineering in a Kubernetes native way.

Litmus Cloud-Native Chaos Engineering Read this in other languages. ???? ???? ???? ???? Overview Litmus is a toolset to do cloud-native chaos engineer

Litmus Chaos 3.2k Aug 12, 2022
vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds

Website • Quickstart • Documentation • Blog • Twitter • Slack vcluster - Virtual Clusters For Kubernetes Lightweight & Low-Overhead - Based on k3s, bu

Loft Labs 1.8k Aug 7, 2022
network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

kakao 97 Aug 6, 2022
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers

k8s-vault-webhook is a Kubernetes admission webhook which listen for the events related to Kubernetes resources for injecting secret directly from sec

Opstree Container Kit 111 Apr 28, 2022