The DataStax Kubernetes Operator for Apache Cassandra

Overview

Cass Operator

License: Apache License 2.0

The DataStax Kubernetes Operator for Apache Cassandra®. This repository replaces the old datastax/cass-operator for use-cases in the k8ssandra project. Some documentation is still out of date and will be modified in the future. Check k8ssandra/k8ssandra for more up to date information.

Getting Started

To create a full featured cluster, the recommend approach is to use the Helm charts from k8ssandra. Check the Getting started documentation at (k8ssandra.io)[https://k8ssandra.io/docs].

Quick start:

# *** This is for GKE Regular Channel - k8s 1.16 -> Adjust based on your cloud or storage options
kubectl create -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/docs/user/cass-operator-manifests.yaml
kubectl create -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/operator/k8s-flavors/gke/storage.yaml
kubectl -n cass-operator create -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/operator/example-cassdc-yaml/cassandra-3.11.x/example-cassdc-minimal.yaml

Loading the operator

Installing the Cass Operator itself is straightforward. Apply the relevant manifest to your cluster as follows:

kubectl apply -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/docs/user/cass-operator-manifests.yaml

Note that since the manifest will install a Custom Resource Definition, the user running the above command will need cluster-admin privileges.

This will deploy the operator, along with any requisite resources such as Role, RoleBinding, etc., to the cass-operator namespace. You can check to see if the operator is ready as follows:

$ kubectl -n cass-operator get pods --selector name=cass-operator
NAME                             READY   STATUS    RESTARTS   AGE
cass-operator-555577b9f8-zgx6j   1/1     Running   0          25h

Creating a storage class

You will need to create an appropriate storage class which will define the type of storage to use for Cassandra nodes in a cluster. For example, here is a storage class for using SSDs in GKE, which you can also find at operator/deploy/k8s-flavors/gke/storage.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: server-storage
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  replication-type: none
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete

Apply the above as follows:

kubectl apply -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/operator/k8s-flavors/gke/storage.yaml

Creating a CassandraDatacenter

The following resource defines a Cassandra 3.11.7 datacenter with 3 nodes on one rack, which you can also find at operator/example-cassdc-yaml/cassandra-3.11.x/example-cassdc-minimal.yaml:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc1
spec:
  clusterName: cluster1
  serverType: cassandra
  serverVersion: 3.11.7
  managementApiAuth:
    insecure: {}
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: server-storage
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
  config:
    cassandra-yaml:
      authenticator: org.apache.cassandra.auth.PasswordAuthenticator
      authorizer: org.apache.cassandra.auth.CassandraAuthorizer
      role_manager: org.apache.cassandra.auth.CassandraRoleManager
    jvm-options:
      initial_heap_size: 800M
      max_heap_size: 800M

Apply the above as follows:

kubectl -n cass-operator apply -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/operator/example-cassdc-yaml/cassandra-3.11.x/example-cassdc-minimal.yaml

You can check the status of pods in the Cassandra cluster as follows:

$ kubectl -n cass-operator get pods --selector cassandra.datastax.com/cluster=cluster1
NAME                         READY   STATUS    RESTARTS   AGE
cluster1-dc1-default-sts-0   2/2     Running   0          26h
cluster1-dc1-default-sts-1   2/2     Running   0          26h
cluster1-dc1-default-sts-2   2/2     Running   0          26h

You can check to see the current progress of bringing the Cassandra datacenter online by checking the cassandraOperatorProgress field of the CassandraDatacenter's status sub-resource as follows:

$ kubectl -n cass-operator get cassdc/dc1 -o "jsonpath={.status.cassandraOperatorProgress}"
Ready

(cassdc and cassdcs are supported short forms of CassandraDatacenter.)

A value of "Ready", as above, means the operator has finished setting up the Cassandra datacenter.

You can also check the Cassandra cluster status using nodetool by invoking it on one of the pods in the Cluster as follows:

$ kubectl -n cass-operator exec -it -c cassandra cluster1-dc1-default-sts-0 -- nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.233.105.125  224.82 KiB  1            65.4%             5e29b4c9-aa69-4d53-97f9-a3e26115e625  r1
UN  10.233.92.96    186.48 KiB  1            61.6%             b119eae5-2ff4-4b06-b20b-c492474e59a6  r1
UN  10.233.90.54    205.1 KiB   1            73.1%             0a96e814-dcf6-48b9-a2ca-663686c8a495  r1

The operator creates a secure Cassandra cluster by default, with a new superuser (not the traditional cassandra user) and a random password. You can get those out of a Kubernetes secret and use them to log into your Cassandra cluster for the first time. For example:

$ # get CASS_USER and CASS_PASS variables into the current shell
$ CASS_USER=$(kubectl -n cass-operator get secret cluster1-superuser -o json | jq -r '.data.username' | base64 --decode)
$ CASS_PASS=$(kubectl -n cass-operator get secret cluster1-superuser -o json | jq -r '.data.password' | base64 --decode)
$ kubectl -n cass-operator exec -ti cluster1-dc1-default-sts-0 -c cassandra -- sh -c "cqlsh -u '$CASS_USER' -p '$CASS_PASS'"

Connected to cluster1 at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.

[email protected]> select * from system.peers;

 peer      | data_center | host_id                              | preferred_ip | rack    | release_version | rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+---------+-----------------+-------------+--------------------------------------+--------------------------
 10.28.0.4 |         dc1 | 4bf5e110-6c19-440e-9d97-c013948f007c |         null | default |          3.11.6 |   10.28.0.4 | e84b6a60-24cf-30ca-9b58-452d92911703 | {'-7957039572378599263'}
 10.28.5.5 |         dc1 | 3e84b0f1-9c1e-4deb-b6f8-043731eaead4 |         null | default |          3.11.6 |   10.28.5.5 | e84b6a60-24cf-30ca-9b58-452d92911703 | {'-3984092431318102676'}

(2 rows)

Installing cluster via Helm

To install a cluster with optional integrated backup/restore and repair utilities, check the k8ssandra/k8ssandra helm charts project.

If you wish to install only the cass-operator, you can run the following command:

helm repo add k8ssandra https://helm.k8ssandra.io/stable
helm install k8ssandra k8ssandra/k8ssandra --set cassandra.enabled=false --set reaper.enabled=false --set reaper-operator.enabled=false --set stargate.enabled=false --set kube-prometheus-stack.enabled=false

You can then apply your CassandraDatacenter.

Custom Docker registry example: Github packages

Github Packages may be used as a custom Docker registry.

First, a Github personal access token must be created.

See:

https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token

Second, the access token will be used to create the Secret:

kubectl create secret docker-registry github-docker-registry --docker-username=USERNAME --docker-password=ACCESSTOKEN --docker-server docker.pkg.github.com

Replace USERNAME with the github username and ACCESSTOKEN with the personal access token.

Features

  • Proper token ring initialization, with only one node bootstrapping at a time
  • Seed node management - one per rack, or three per datacenter, whichever is more
  • Server configuration integrated into the CassandraDatacenter CRD
  • Rolling reboot nodes by changing the CRD
  • Store data in a rack-safe way - one replica per cloud AZ
  • Scale up racks evenly with new nodes
  • Scale down racks evenly by decommissioning existing nodes
  • Replace dead/unrecoverable nodes
  • Multi DC clusters (limited to one Kubernetes namespace)

All features are documented in the User Documentation.

Containers

The operator is comprised of the following container images working in concert:

Overriding properties of cass-operator created Containers

If the CassandraDatacenter specifies a podTemplateSpec field, then containers with specific names can be used to override default settings in containers that will be created by cass-operator.

Currently cass-operator will create an InitContainer with the name of "server-config-init". Normal Containers that will be created have the names "cassandra", "server-system-logger", and optionally "reaper".

In general, the values specified in this way by the user will override anything generated by cass-operator.

Of special note is that user-specified environment variables, ports, and volumes in the corresponding containers will be added to the values that cass-operator automatically generates for those containers.

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc1
spec:
  clusterName: cluster1
  serverType: cassandra
  serverVersion: 3.11.7
  managementApiAuth:
    insecure: {}
  size: 3
  podTemplateSpec:
    spec:
      initContainers:
        - name: "server-config-init"
          env:
          - name: "EXTRA_PARAM"
            value: "123"
      containers:
        - name: "cassandra"
          terminationMessagePath: "/dev/other-termination-log"
          terminationMessagePolicy: "File"
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: server-storage
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
  config:
    cassandra-yaml:
      authenticator: org.apache.cassandra.auth.PasswordAuthenticator
      authorizer: org.apache.cassandra.auth.CassandraAuthorizer
      role_manager: org.apache.cassandra.auth.CassandraRoleManager
    jvm-options:
      initial_heap_size: 800M
      max_heap_size: 800M

Requirements

  • Kubernetes cluster, 1.16 or newer.

Contributing

If you wish to file a bug, enhancement proposal or have other questions, use the issues in repository k8ssandra/k8ssandra. PRs should target this repository and you can link the PR to issue repository with k8ssandra/k8ssandra#ticketNumber syntax.

For other means of contacting, check k8ssandra community resources.

Developer setup

Almost every build, test, or development task requires the following pre-requisites...

  • Golang 1.15 or newer
  • Docker, either the docker.io packages on Ubuntu, Docker Desktop for Mac, or your preferred docker distribution.
  • mage: There are some tips for using mage in docs/developer/mage.md

Building

The operator uses mage for its build process.

Build the Operator Container Image

This build task will create the operator container image, building or rebuilding the binary from golang sources if necessary:

mage operator:buildDocker

Build the Operator Binary

If you wish to perform ONLY to the golang build or rebuild, without creating a container image:

mage operator:buildGo

Testing

mage operator:testGo

End-to-end Automated Testing

Run fully automated end-to-end tests...

mage integ:run

Docs about testing are here. These work against any k8s cluster with six or more worker nodes.

Manual Local Testing

There are a number of ways to run the operator, see the following docs for more information:

  • k8s targets: A set of mage targets for automating a variety of tasks for several different supported k8s flavors. At the moment, we support KIND, k3d, and gke. These targets can setup and manage a local cluster in either KIND or k3d, and also a remote cluster in gke. Both KIND and k3d can simulate a k8s cluster with multiple worker nodes on a single physical machine, though it's necessary to dial down the database memory requests.

The user documentation also contains information on spinning up your first operator instance that is useful regardless of what Kubernetes distribution you're using to do so.

Uninstall

This will destroy all of your data!

Delete your CassandraDatacenters first, otherwise Kubernetes will block deletion because we use a finalizer.

kubectl delete cassdcs --all-namespaces --all

Remove the operator Deployment, CRD, etc.

kubectl delete -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/docs/user/cass-operator-manifests.yaml

Contacts

For development questions, please reach out on Development mailing list, or by opening an issue on k8ssandra/k8ssandra GitHub repository.

For usage questions, please visit our User mailing list.

License

Copyright DataStax, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Issues
  • Blog post or documentation on how to replace broken (or potentially broken) nodes?

    Blog post or documentation on how to replace broken (or potentially broken) nodes?

    Is your feature request related to a problem? Please describe. As an operator of cassandra, I want to replace nodes when for example their disks have degraded or have any other signs of hardware failure. on the k8ssandra.io website I would like to see some guide explaining how this would work in the Kubernetes world. Also describing how to replace an already failed node would be nice.

    Describe the solution you'd like

    Describe the two scenarios in the docs on how to replace a non-broken node, and how to replace a broken node.

    I think the instructions would look something like this Non-broken:

    1. cordon the node
    2. delete the pod
    3. delete the pvc (and the pv) :question: (This might be needed in case of local storage; as the PVC is bound to a specific node, and you want to make sure a new PVC is created (with WaitForConsumer) so that the pod ends up on a new node and not the old one again)
    4. delete the node or replace the broken persistent volume
    5. Pod should now be pending
    6. add new node or uncordon the existing node (if the volume was replaced)
    7. PVC binds to the new disk, and pod should now be running

    Broken node:

    1. The pod is pending as it is trying to reschedule to the broken node; where its PV and PVC is bound to (e.g. when using local persistent volume)
    2. Delete the PVC (Need to do this first. Otherwise if you delete the pod, it will be bound to the existing PVC, and thus be re-scheduled on the broken node, and stay Pending forever)
    3. Delete the pod
    4. Pod and PVC should be recreated now and both pending
    5. Add new node

    These are hypothetical steps. I didn't test them. But it would be nice to describe these procedures. especially when using local persistent volumes some care has to be taken to make sure that the new pods get scheduled on new nodes. This is tricky and can probably use a good step-by-step guide.

    Even better would be if we could somehow automate this in the operator. But IDK what that'd look like

    Describe alternatives you've considered None

    Additional context Add any other context or screenshots about the feature request here.

    ┆Issue is synchronized with this Jira Task by Unito

    opened by sync-by-unito[bot] 11
  • migration from cass-operator to k8ssandra

    migration from cass-operator to k8ssandra

    I have multiple cassandra clusters running with the cass-operator, and I cannot lose the data in those clusters. I also need to avoid any downtime, to the extent possible. They are all in EKS hosted kubernetes clusters.

    An upgrade path or document, of how to import cass-operator based clusters into a 'full' k8ssandra setup would be needed.

    One idea could be to create a new cluster in k8ssandra and then sync the data between clusters, and then 'just' swap the k8s service to point to new cluster.

    Avoiding any data loss, and minimizing downtime would be the highest priority. Temporarily using more resources would not be an issue in my use case.

    ┆Issue is synchronized with this Jira Task by Unito

    opened by sync-by-unito[bot] 11
  • Not compatible with Kubernetes 1.22

    Not compatible with Kubernetes 1.22

    When trying to install the cass operator on Kubernetes 1.22 like this

    kubectl apply -n cass-operator -f https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.7.1/docs/user/cass-operator-manifests.yaml

    it fails with the following error message:

    unable to recognize "https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.8.0-rc.1/docs/user/cass-operator-manifests.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"
    unable to recognize "https://raw.githubusercontent.com/k8ssandra/cass-operator/v1.8.0-rc.1/docs/user/cass-operator-manifests.yaml": no matches for kind "ValidatingWebhookConfiguration" in version "admissionregistration.k8s.io/v1beta1"
    

    The reason for this is that apiextensions.k8s.io/v1beta1 and admissionregistration.k8s.io/v1beta1 are not supported anymore and have been replaced with the */v1 version.

    ┆Issue is synchronized with this Jira Bug by Unito ┆Fix Versions: cass-operator-1.8.0,k8ssandra-1.4.0 ┆Issue Number: K8SSAND-884 ┆Priority: Medium

    bug 
    opened by owetterau 10
  • Updates to podTemplateSpec do not propagate to the statefulSet

    Updates to podTemplateSpec do not propagate to the statefulSet

    Problem description

    If I make a change to the podTemplateSpec within the cassDC CR, I do not see the change propagate to the sts without manually deleting it.

    Reproducing

    For example, if I have a cassandra container defined like so within the podTemplateSpec field:

    podTemplateSpec:
          metadata:
            creationTimestamp: null
    spec:
            containers:
            - env:
              - name: LOCAL_JMX
                value: "no"
              name: cassandra
              resources: {}
    

    I add imagePullPolicy: IfNotPresent to the container. This does not propagate without manually deleting the sts.

    ┆Issue is synchronized with this Jira Task by Unito ┆Issue Number: K8SSAND-918 ┆Priority: Medium

    bug needs-triage 
    opened by Miles-Garnsey 10
  • Updating Statefulsets is broken when upgrading to 1.7.0

    Updating Statefulsets is broken when upgrading to 1.7.0

    What happened? I created a CassandraDatacenter with cass-operator 1.6.0. I then updated to 1.7.0. cass-operator fails to apply StatefulSet changes. cass-operator logs this error:

    {"level":"error","ts":1621959974.8570168,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"cassandradatacenter-controller","request":"default/labels","error":"StatefulSet.apps \"labels-labels-default-sts\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden","stacktrace":"github.com/go-logr/zapr.(_zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(_Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(_Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(_Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}

    Note this part in particular:

    Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden"
    

    This regression is due to the changes in #18 which change the ServiceName property of the StatefulSet.

    cass-operator logs this error and does not continue with the reconciliation process. The Cassandra pods will remain running. If you try to change the CassandraDatacenter spec in such a way that would result in a change to the StatefulSet, the changes won't be applied.

    Did you expect to see something different?

    How to reproduce it (as minimally and precisely as possible):

    1. Deploy cass-operator 1.6.0
    2. Create a CassandraDatacenter
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: labels
    spec:
      clusterName: labels
      size: 1
      storageConfig:
        cassandraDataVolumeClaimSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: standard
      serverType: cassandra
      serverVersion: 3.11.10
      serverImage: k8ssandra/cass-management-api:3.11.10-v0.1.24
      disableSystemLoggerSidecar: true
      dockerImageRunsAsCassandra: true
      podTemplateSpec:
        metadata:
          labels:
            env: dev
        spec:
          containers: []
    
    1. Wait for the CassandraDatacenter to become ready
    2. Upgrade the cass-operator deployment to 1.7.0
    3. Check the cass-operator logs and you should find see the above error message

    Environment

    • Cass Operator version:

      v1.7.0

      **Anything else we need to know?**:The error occurs in the `CheckRackPodTemplate` function in `reconcile_racks.go`. This will impact any existing CassandraDatacenter that upgrades cass-operator.The bug will not impact new CassandraDatacenters installed with 1.7.0. I am inclined to say that we need to revert the changes in #18; however, doing so will introduce this problem for users who created new CassandraDatacenters with 1.7.0 and then go to upgrade. Given that we need to carefully consider how best to resolve this.

    ┆Issue is synchronized with this Jiraserver Task by Unito ┆Fix Versions: k8ssandra-1.2.0,cass-operator-1.7.1 ┆Issue Number: K8SSAND-483 ┆Priority: Highest

    bug 
    opened by jsanda 9
  • K8ssandra backup to S3 via Medusa

    K8ssandra backup to S3 via Medusa

    Type of question

    Backup/restore-related question

    What did you do? Followed document https://k8ssandra.io/docs/topics/restore-a-backup/ in order to setup backup/restore. Is documentation up to date? Did you expect to see some different? I expect to see medusa container running under cassandra node pods. but I do see only 2 container instead of 3. medusa container is not being created.

    kubectl logs k8ssandra-dc1-default-sts-0 -c medusa
    error: container medusa is not valid for pod k8ssandra-dc1-default-sts-0
    
     kubectl get pods
    NAME                                                READY   STATUS      RESTARTS   AGE
    k8ssandra-cass-operator-7d5df6d49-5hvpk             1/1     Running     0          98m
    k8ssandra-dc1-default-sts-0                         2/2     Running     0          98m
    k8ssandra-dc1-default-sts-1                         2/2     Running     0          91m
    k8ssandra-dc1-default-sts-2                         2/2     Running     0          91m
    k8ssandra-dc1-stargate-644f7fd75b-2w944             1/1     Running     0          98m
    k8ssandra-grafana-679b4bbd74-lt9gb                  2/2     Running     0          15m
    k8ssandra-kube-prometheus-operator-85695ffb-2vwqj   1/1     Running     0          98m
    k8ssandra-medusa-operator-798567d685-7t52p          1/1     Running     0          15m
    k8ssandra-reaper-7bb77d575c-clxrs                   1/1     Running     0          97m
    k8ssandra-reaper-operator-79fd5b4655-h2p74          1/1     Running     0          98m
    k8ssandra-reaper-schema-bqhfd                       0/1     Completed   0          97m
    prometheus-k8ssandra-kube-prometheus-prometheus-0   2/2     Running     1          98m
    
    

    Environment

    DEV

    • Helm charts version 1.0.0

    ┆Issue is synchronized with this Jira Task by Unito

    opened by sync-by-unito[bot] 7
  • Regression test for #110

    Regression test for #110

    What this PR does: Adds a regression test for #110

    I tried running the integration tests locally with kind, but they keep timing out. Hope that this PR will trigger some automated CI that makes the test go red

    Which issue(s) this PR fixes: Adds a regression test for #110

    Checklist

    • [ ] Changes manually tested
    • [x] Automated Tests added/updated
    • [ ] Documentation added/updated
    • [ ] CHANGELOG.md updated (not required for documentation PRs)
    • [x] CLA Signed: DataStax CLA
    opened by arianvp 7
  • There are no logs for Cassandra startup errors

    There are no logs for Cassandra startup errors

    Bug Report

    Describe the bug It is difficult to debug Cassandra startup errors. The primary process in the cassandra container is the management-api. It manages the lifecycle of Cassandra. When a Cassandra pod is running but not ready, this generally indicates that Cassandra has failed to start for some reason. Often times there are no logs available which makes debugging really difficult.

    I will provide an example using an invalid heap configuration.

    To Reproduce Steps to reproduce the behavior:

    1. helm install broken-cluster k8ssandra/k8ssandra -f test-values.yaml

    test-values.yaml:

    cassandra:
      heap:
        size: 768
        newGenSize: 256
      datacenters:
      - name: dc1
        size: 3
    
    1. Wait for the cassandra container to start. It should not reach the ready state.
    2. Do kubectl logs broken-cluster-dc1-default-sts-0 -c server-system-logger. It will report:

    can't open '/var/log/cassandra/system.log': No such file or directory

    1. Do kubectl exec -it broken-cluster-dc1-default-sts-0 -c cassandra -- ls /var/log/cassandra. The directory is empty.

    Expected behavior Logs with the failure should be available. If I open a shell into the Cassandra pod and start Cassandra in the foreground with -f, I get the following output:

    [email protected]:/opt/cassandra/bin$ ./cassandra -f
    Error occurred during initialization of VM
    Too small initial heap
    

    This ouptut should be captured in some log file.

    • Helm charts version info 1.0, 1.1

    ┆Issue is synchronized with this Jira Task by Unito

    opened by sync-by-unito[bot] 6
  • cass-operator image tag is not updated to v1.7.1 in cass-operator-manifests.yaml in 1.7.1 release

    cass-operator image tag is not updated to v1.7.1 in cass-operator-manifests.yaml in 1.7.1 release

    What happened?

    Did you expect to see something different?

    How to reproduce it (as minimally and precisely as possible):

    Environment

    • Cass Operator version:

      Insert image tag or Git SHA here

      * Kubernetes version information: `kubectl version` * Kubernetes cluster kind:```insert how you created your cluster: kops, bootkube, etc.```* Manifests:
    
    insert manifests relevant to the issue
    
    
    • Cass Operator Logs:
    
    insert Cass Operator logs relevant to the issue here
    

    Anything else we need to know?:

    ┆Issue is synchronized with this Jiraserver Task by Unito ┆Fix Versions: k8ssandra-1.2.0,cass-operator-1.7.1 ┆Issue Number: K8SSAND-487 ┆Priority: Medium

    bug complexity:low 
    opened by anoopps79 5
  • Allow to tune readahead on the Cassandra data PV mount.

    Allow to tune readahead on the Cassandra data PV mount.

    Default settings for read ahead on most systems is 256 sectors (with a sector size of 512 bytes, that's 130 kb off disk per read). We need to set readahead to 64 sectors by default (32kb) and allow to further tune the value (the 32 - 64 range proved to be the best compromise between reads and compactions performance).

    ┆Issue is synchronized with this Jira Task by Unito

    opened by sync-by-unito[bot] 5
  • Add additional labels to all resources with managed-by label

    Add additional labels to all resources with managed-by label

    What this PR does: Adds new kubernetes labels to all managed resources:

    app.kubernetes.io/name app.kubernetes.io/instance app.kubernetes.io/version

    Which issue(s) this PR fixes: Fixes #185

    Checklist

    • [ ] Changes manually tested
    • [x] Automated Tests added/updated
    • [ ] Documentation added/updated
    • [x] CHANGELOG.md updated (not required for documentation PRs)
    • [x] CLA Signed: DataStax CLA
    opened by burmanm 0
  • Use the new mgmt api Go Client in cass-operator

    Use the new mgmt api Go Client in cass-operator

    The httphelper should be replaced with the new wrapped/facaded Go client for the management api in cass-operator.

    ┆Issue is synchronized with this Jira Task by Unito ┆Epic: Management API Go client ┆Issue Number: K8SSAND-971 ┆Priority: Medium

    opened by sync-by-unito[bot] 1
  • Create a k8ssandra wrapper library for the mgmt api Go client

    Create a k8ssandra wrapper library for the mgmt api Go client

    Once the go client for the management api is available, we need to code a wrapper library to make it usable in the context of K8ssandra, or some facade code that would allow us to elegantly deal with sending requests to the right pods. This code should be generic in a way that it doesn't need to be updated when the Go Client gets modified to support a new operation.

    ┆Issue is synchronized with this Jira Task by Unito ┆Epic: Management API Go client ┆Issue Number: K8SSAND-969 ┆Priority: Medium

    opened by sync-by-unito[bot] 0
  • Implement /versions/feature endpoint and Supports feature in the client

    Implement /versions/feature endpoint and Supports feature in the client

    What this PR does: Adds ability to poll the management-api what features it provides.

    Which issue(s) this PR fixes: Fixes #

    Checklist

    • [x] Changes manually tested
    • [x] Automated Tests added/updated
    • [ ] Documentation added/updated
    • [ ] CHANGELOG.md updated (not required for documentation PRs)
    • [x] CLA Signed: DataStax CLA
    opened by burmanm 2
  • Tests need to verify that everything works with older mgmt-api versions also

    Tests need to verify that everything works with older mgmt-api versions also

    What happened? As seen in PR #176 , if we had only tested this with a new version of management-apis, all the tests would pass. But as soon as user would use an older version things would break completely and the cass-operator is completely unusable. We need to verify that new features that use newer features from management-api also work with older serverImages.

    bug 
    opened by burmanm 1
  • root file system in cassandra container should be read-only

    root file system in cassandra container should be read-only

    Why do we need it? The root file system should be read-only for improved security. This is considered a best practice for security.

    #196 was created specifically for DSE. The goal for this ticket is to address both OSS Cassandra and DSE. If it turns out that there are substantially different changes required for Cassandra vs DSE, then they should be addressed in separate PRs.

    Here is an example manifest that configures the cassandra container with a read-only root file system:

    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: dc1
    spec:
      clusterName: test
      serverType: cassandra
      serverVersion: "4.0.0"
      systemLoggerImage:
      serverImage:
      size: 1
      storageConfig:
        cassandraDataVolumeClaimSpec:
          storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
      podTemplateSpec:
        spec:
          containers:
            - name: "cassandra"
              securityContext:
                readOnlyRootFilesystem: true
                runAsNonRoot: true
      config:
        jvm-server-options:
          initial_heap_size: "800M"
          max_heap_size: "800M"
    

    The cassandra goes into a crash loop with this in the log:

    Starting Management API
    cp: cannot create regular file '/opt/cassandra/conf/cassandra-env.sh': Read-only file system
    cp: cannot create regular file '/opt/cassandra/conf/cassandra-rackdc.properties': Read-only file system
    cp: cannot create regular file '/opt/cassandra/conf/cassandra.yaml': Read-only file system
    cp: cannot create regular file '/opt/cassandra/conf/jvm11-server.options': Read-only file system
    cp: cannot create regular file '/opt/cassandra/conf/jvm8-server.options': Read-only file system
    cp: cannot create regular file '/opt/cassandra/conf/jvm.options': Read-only file system
    cp: cannot create regular file '/opt/cassandra/conf/jvm-server.options': Read-only file system
    cp: cannot create regular file '/opt/cassandra/conf/logback.xml': Read-only file system
    

    Environment

    • Cass Operator version: v1.8.0

    Anything else we need to know?: We have already implemented a solution (or at least partial) in the k8ssandra Helm chart. Look at the cassdc.yaml template here. Here are the highlights:

    • Add a cassandra-config volume to be used to /etc/cassandra
    • The base-config-init init container copies out of box configs to cassandra-config
    • The server-config-init init container (i.e., config builder) runs and writes files to the server-config volume
    • The cassandra container mounts server-config at /config
    • The cassandra container mounts cassandra-config at /etc/cassandra
    • The entry point script of the cassandra container copies files from /config to /etc/cassandra

    Again this what is already done in k8ssandra. We basically need to backport it.

    ┆Issue is synchronized with this Jira Task by Unito ┆Issue Number: K8SSAND-962 ┆Priority: Medium

    enhancement security 
    opened by jsanda 4
  • Unable to create CassandraDatacenter if Setup containers.securityContext.readOnlyRootFilesystem: true

    Unable to create CassandraDatacenter if Setup containers.securityContext.readOnlyRootFilesystem: true

    What happened? I tried to create a CassandraDatacenter with the containers.securityContext.readOnlyRootFilesystem: true, but the pod is always in the CrashLoopBackOff status.

    The pods are running normally if I change the containers.securityContext.readOnlyRootFilesystem: false

    The yaml

    # Sized to work on 3 k8s workers nodes with 1 core / 4 GB RAM
    # See neighboring example-cassdc-full.yaml for docs for each parameter
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: dc21
    spec:
      nodeAffinityLabels:
        beta.kubernetes.io/arch: amd64
      clusterName: cluster2
      serverType: dse
      serverVersion: "6.8.14"
      systemLoggerImage: 
      serverImage: 
      configBuilderImage: 
      managementApiAuth:
        insecure: {}
      size: 1
      resources:
        requests:
          cpu: 1
          memory: 4Gi
        limits:
          cpu: 1
          memory: 4Gi
      storageConfig:
        cassandraDataVolumeClaimSpec:
          storageClassName: nfs-client
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
      dockerImageRunsAsCassandra: false
      podTemplateSpec:
        spec:
          initContainers:
          - name: server-config-init
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                drop:
                - ALL
              privileged: false
              readOnlyRootFilesystem: true
              runAsNonRoot: true
          containers:
          - name: "cassandra"
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                drop:
                - ALL
              privileged: false
              readOnlyRootFilesystem: true
              runAsNonRoot: true
          hostIPC: false
          hostNetwork: false
          hostPID: false
          securityContext:
            runAsNonRoot: true
      config:
        jvm-server-options:
          initial_heap_size: "800M"
          max_heap_size: "800M"
          additional-jvm-opts:
            # As the database comes up for the first time, set system keyspaces to RF=3
            - "-Ddse.system_distributed_replication_dc_names=dc21"
            - "-Ddse.system_distributed_replication_per_dc=3"
    

    The pod status

    MacBook-Pro-3:db zhiminsun$ oc get pod 
    NAME                                                 READY   STATUS             RESTARTS   AGE
    cluster2-dc21-default-sts-0                          1/2     CrashLoopBackOff   213        17h
    

    The pod Events error

    Events:
      Warning  BackOff         62s (x7 over 103s)  kubelet, worker2.zhim.cp.fyre.ibm.com  Back-off restarting failed container
    

    Did you expect to see something different? I expect that containers.securityContext.readOnlyRootFilesystem: true

    ┆Issue is synchronized with this Jira Task by Unito ┆Issue Number: K8SSAND-954 ┆Priority: Medium

    bug 
    opened by zhimsun 5
  • Create OLM bundle for 1.8.0

    Create OLM bundle for 1.8.0

    What is missing? We currently create only a skeleton version of the OLM bundle & catalog files, but they do not include all the information that the previous versions have included and as such are not automatically deployable after we create a release. We should seek to include all the necessary information in them.

    Why do we need it? Openshift integration needs it for OperatorHub, as mentioned in #194

    enhancement 
    opened by burmanm 0
  • Add new Management API endpoints to HTTP Helper: GetKeyspaceReplication, ListTables, CreateTable [K8SSAND-926]

    Add new Management API endpoints to HTTP Helper: GetKeyspaceReplication, ListTables, CreateTable [K8SSAND-926]

    What this PR does:

    This PR adds new Management API endpoints to HTTP Helper: GetKeyspaceReplication, ListTables, CreateTable.

    It requires https://github.com/k8ssandra/management-api-for-apache-cassandra/pull/143.

    Which issue(s) this PR fixes:

    This PRs is part of the fix for https://github.com/k8ssandra/k8ssandra-operator/issues/156.

    Checklist

    • [x] Changes manually tested
    • [x] Automated Tests added/updated
    • [ ] Documentation added/updated
    • [x] CHANGELOG.md updated (not required for documentation PRs)
    • [x] CLA Signed: DataStax CLA
    opened by adutra 4
  • [Discuss] Simplify CassDC services

    [Discuss] Simplify CassDC services

    What is missing?

    The current services in the cassDC are slightly confusing to users at times. We have CLUSTERNAME-DCNAME-all-pods-service as well as a CLUSTERNAME-DCNAME-dc1-service which differ only in that all-pods-service has additional ports and also exposes pods which are not yet passing readiness probes.

    We also have the additional-seeds-service which is empty for all single DC deployments (most deployments).

    Why do we need it?

    Services are usually subject to monitoring and having excess services clutters the operator experience. For developers, it is currently confusing which service should be used to talk to the Cassandra API on 9042. While experienced users will be able to infer this from which ports are open, it makes things less intuitive.

    ** Proposed solution**

    This is just a suggestion, there may be additional factors to consider given commentary in k8ssandra-operator issue #67.

    I suggest we rename:

    CLUSTERNAME-DCNAME-service -> CLUSTERNAME-DCNAME-client-apis CLUSTERNAME-DCNAME-all-pods-service -> CLUSTERNAME-DCNAME-internal-monitoring

    (No need for the type of the resource at the end of the name as this is obvious when inspecting it.)

    I also suggest we find a way to avoid creating the additional-seeds-service unless additional seeds are actually required. Testing will be needed to ensure we continue to avoid an additional STS rolling restart - which is not desirable.

    ┆Issue is synchronized with this Jira Task by Unito ┆Issue Number: K8SSAND-952 ┆Priority: Medium

    enhancement 
    opened by Miles-Garnsey 5
An operator to support Haschicorp Vault configuration workflows from within Kubernetes

Vault Config Operator This operator helps set up Vault Configurations. The main intent is to do so such that subsequently pods can consume the secrets

null 0 Oct 18, 2021
A curated list of awesome Kubernetes tools and resources.

Awesome Kubernetes Resources A curated list of awesome Kubernetes tools and resources. Inspired by awesome list and donnemartin/awesome-aws. The Fiery

Tom Huang 913 Oct 23, 2021
Kubernetes Operator for MySQL NDB Cluster.

MySQL NDB Operator The MySQL NDB Operator is a Kubernetes operator for managing a MySQL NDB Cluster setup inside a Kubernetes Cluster. This is in prev

MySQL 8 Oct 16, 2021
Access your Kubernetes Deployment over the Internet

Kubexpose: Access your Kubernetes Deployment over the Internet Kubexpose makes it easy to access a Kubernetes Deployment over a public URL. It's a Kub

Abhishek Gupta 23 Oct 21, 2021
A Kubernetes operator to manage ThousandEyes tests

ThousandEyes Kubernetes Operator ThousandEyes Kubernetes Operator is a Kubernetes operator used to manage ThousandEyes Tests deployed via Kubernetes c

Cisco DevNet 26 Sep 28, 2021
Lightweight, CRD based envoy control plane for kubernetes

Lighweight, CRD based Envoy control plane for Kubernetes: Implemented as a Kubernetes Operator Deploy and manage an Envoy xDS server using the Discove

null 36 Oct 20, 2021
Helm Operator is designed to managed the full lifecycle of Helm charts with Kubernetes CRD resource.

Helm Operator Helm Operator is designed to install and manage Helm charts with Kubernetes CRD resource. Helm Operator does not create the Helm release

Chen Zhiwei 4 Sep 2, 2021
The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Elastalert Operator for Kubernetes The Elastalert Operator is an implementation of a Kubernetes Operator. Getting started Firstly, learn How to use el

null 13 Sep 23, 2021
The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Oracle 9 Sep 24, 2021
Nebula Operator manages NebulaGraph clusters on Kubernetes and automates tasks related to operating a NebulaGraph cluster

Nebula Operator manages NebulaGraph clusters on Kubernetes and automates tasks related to operating a NebulaGraph cluster. It evolved from NebulaGraph Cloud Service, makes NebulaGraph a truly cloud-native database.

vesoft inc. 32 Sep 28, 2021
The cortex-operator is a project to manage the lifecycle of Cortex in Kubernetes.

cortex-operator The cortex-operator is a project to manage the lifecycle of Cortex in Kubernetes. Project status: alpha Not all planned features are c

Opstrace inc. 28 Oct 20, 2021
Kubedd – Check migration issues of Kubernetes Objects while K8s upgrade

Kubedd – Check migration issues of Kubernetes Objects while K8s upgrade

Devtron Labs 95 Oct 20, 2021
Simplify Kubernetes Secrets Management with Dockhand Secrets Operator

dockhand-secrets-operator Secrets management with GitOps can be challenging in Kubernetes environments. Often engineers resort to manual secret creati

BoxBoat 10 Oct 20, 2021
Simple Kubernetes operator for handling echo CRDs 🤖

echoperator ?? Simple Kubernetes operator for handling echo CRDs. Kubernetes operator pattern implementation using the client-go library. Altough ther

Martín Montes 16 Sep 30, 2021
The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes

DGL Operator The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network distributed or non-distributed training on Kubernetes

Qihoo 360 28 Sep 16, 2021
A fluxcd controller for managing remote manifests with kubecfg

kubecfg-operator A fluxcd controller for managing remote manifests with kubecfg This project is in very early stages proof-of-concept. Only latest ima

Pelotech 32 Oct 17, 2021
topolvm operator provide kubernetes local storage which is light weight and high performance

Topolvm-Operator Topolvm-Operator is an open source cloud-native local storage orchestrator for Kubernetes, which bases on topolvm. Supported environm

Alauda.io 11 Oct 9, 2021
Kubernetes operator to autoscale Google's Cloud Bigtable clusters

Bigtable Autoscaler Operator Bigtable Autoscaler Operator is a Kubernetes Operator to autoscale the number of nodes of a Google Cloud Bigtable instanc

RD Station 21 Jul 6, 2021