Asynchronous data replication for Kubernetes volumes

Overview

VolSync

VolSync asynchronously replicates Kubernetes persistent volumes between clusters using either rsync or rclone. It also supports creating backups of persistent volumes via restic.

Documentation Status Go Report Card codecov maturity

Documentation operator

Getting started

The fastest way to get started is to install VolSync in a kind cluster:

  • Install kind if you don't already have it:
    $ go install sigs.k8s.io/[email protected]
  • Use our convenient script to start a cluster, install the CSI hostpath driver, and the snapshot controller.
    $ ./hack/setup-kind-cluster.sh
  • Install the latest release via Helm
    $ helm repo add backube https://backube.github.io/helm-charts/
    $ helm install --create-namespace -n volsync-system volsync backube/volsync
  • See the usage instructions for information on setting up replication relationships.

More detailed information on installation and usage can be found in the official documentation.

VolSync kubectl plugin

We're also working on a command line interface to VolSync via a kubectl plugin. To try that out:

make cli
cp bin/kubectl-volsync /usr/local/bin/

NOTE: volsync plugin is being actively developed. Options, flags, and names are likely to be updated frequently. PRs and new issues are welcome!

Available commands:

kubectl volsync start-replication
kubectl volsync set-replication
kubectl volsync continue-replication
kubectl volsync remove-replication

Try the current examples:

Helpful links

Licensing

This project is licensed under the GNU AGPL 3.0 License with the following exceptions:

Comments
  • Document migration CLI sub-command

    Document migration CLI sub-command

    Describe what this PR does

    • [x] Document kubectl-volsync migration
    • [x] Fix references to external migration
    • [x] Remove old external sync script

    Is there anything that requires special attention?

    Related issues: Fixes: #154 Depends on: #141 (docs)

    lgtm approved size/L 
    opened by JohnStrunk 20
  • set minKubeVersion to 1.19.0 in CSV (added also to version.mk file)

    set minKubeVersion to 1.19.0 in CSV (added also to version.mk file)

    Signed-off-by: Tesshu Flower [email protected]

    Describe what this PR does

    • adds MIN_KUBE_VERSION of 1.19.0 to version.mk

    • Subsequent calls to make bundle will update the bundle csv to use this MIN_KUBE_VERSION.

    • Sets the minKubeVersion in the bundle csv to 1.19.0

    Is there anything that requires special attention?

    Related issues:

    lgtm approved size/XS 
    opened by tesshuflower 18
  • Port restic e2e tests to ansible

    Port restic e2e tests to ansible

    Signed-off-by: Tesshu Flower [email protected]

    Describe what this PR does Ports over the following kuttl tests over to ansible e2e test:

    • restic-with-manual-trigger
    • restic-with-previous
    • restic-with-restoreasof
    • restic-without-trigger

    Is there anything that requires special attention?

    • I modified the write_to_pvc role to optionally leave the pod behind to allow for the podAffinity parts to work

    • I had to alter the behavior of restic-without-trigger slightly as I wasn't able to catch the condition Synchronizing going to "false" - previously we did this in 20-ensure-multiple-syncs:

      kubectl -n $NAMESPACE wait --for=condition=Synchronizing=true --timeout=5m ReplicationSource/source
      kubectl -n $NAMESPACE wait --for=condition=Synchronizing=false --timeout=5m ReplicationSource/source
      kubectl -n $NAMESPACE wait --for=condition=Synchronizing=true --timeout=5m ReplicationSource/source
      

      I think with no trigger the status is changing so fast I wasn't able to catch it with the ansible k8s calls. Instead I wait until another sync completes and make sure the lastSyncTime doesn't match the previous. Might make the test take a bit longer .

    • Additionally moves some of the pvc writer roles to use jobs instead of creating pods directly - this is to avoid issues when running tests manually with users with different permissions (for example running the tests as an admin user means the pods get created with a different SCC which affects the pod options). It will still make sense to move the other pod roles (reader ones) into jobs, but this can be done in a separate PR.

    Related issues:

    lgtm approved size/XXL 
    opened by tesshuflower 15
  • Pvc storage size - clone & snapshot - use capacity if possible

    Pvc storage size - clone & snapshot - use capacity if possible

    Describe what this PR does For pvcFromSnapshot:

    • attempt to use the restoreSize from the snapshot to determine size to create the PVC
    • if restoreSize isn't available, attempt to use the status.capacity from the origin pvc
    • fallback to using the requested storage size from origin pvc

    For clone:

    • attempt to use the status.capacity from the origin pvc
    • fallback to using the requested storage size from origin pvc

    Is there anything that requires special attention?

    • For snapshots, we had discussed making sure the snapshot is bound at the beginning of pvcFromSnapshot() - it turns out that EnsurePVCFromSrc() calls ensureSnapshot first and then pvcFromSnapshot (see https://github.com/backube/volsync/blob/main/controllers/volumehandler/volumehandler.go#L73-L78) - and in fact ensureSnapshot will only return a snapshot after it's bound. So it turns out the bound check was already there https://github.com/backube/volsync/blob/fa4bdcfb2d88d438f39bdceb1159c5a2f30fbaa6/controllers/volumehandler/volumehandler.go#L405-L414

      The issue I found when recreating is it's possible to get a snapshot with a status like this:

      {"boundVolumeSnapshotContentName":"snapcontent-b735e05f-6352-45bf-a5ab-89a87577b64f","readyToUse":false}}
      

      That is, bound but no restoreSize set yet

      and then later get one with restore size set - so in this sort of situation we'll always be falling back to the PVC capacity.

      We could potentially try to wait for the snapshot readyToUse: true before trying to create a PVC?

      Unfortunately I wasn't able to find any concrete information about whether these fields are mandatory or not, none are specifically shown as required in the status as far as I can tell.

    • For clone, one thing I see that we could hit is where a user has their storageclass set to WaitForFirstConsumer. In this case if a user creates a repilcationsource for that PVC then capacity may never be filled out, and for clone we'll still fallback to using the requested size. I wasn't sure if this is a real concern out there or not - we could potentially check that the source pvc is in Bound state before proceeding.

      note I don't think this is an issue with the volumesnapshot case as I believe volumesnapshots won't go into bound state until the pvc has proceeded to bound state.

    Related issues: https://github.com/backube/volsync/issues/246 https://github.com/backube/volsync/issues/48

    lgtm approved size/L 
    opened by tesshuflower 15
  • Syncthing - permission reduction

    Syncthing - permission reduction

    Signed-off-by: Tesshu Flower [email protected]

    Describe what this PR does

    • Runs syncthing as normal user by default
    • Runs syncthing as root with elevated permissions if namespace has the volsync privileged mover annotation set
    • Enables specifying the pod security context in the syncthing spec

    Is there anything that requires special attention?

    Related issues: https://github.com/backube/volsync/issues/368

    lgtm approved size/XXL 
    opened by tesshuflower 14
  • Implements the Syncthing  data mover

    Implements the Syncthing data mover

    Describe what this PR does

    This PR seeks to implement the Syncthing API in the VolSync operator, making use of Syncthing's REST API.

    The following things are added:

    • [x] Syncthing controller implementation
    • [x] Syncthing controller unit-testing
    • [x] E2E testing for the Syncthing data mover
    • [x] Documentation for the Syncthing mover

    Is there anything that requires special attention?

    Related issues:

    lgtm approved size/XXL 
    opened by RobotSail 14
  • Fix auto gen chck for release branches

    Fix auto gen chck for release branches

    • custom scorecard config generation was always using "latest" which doesn't actually match what we want in release branches - where we want to use the custom-scorecard-image tagged with "release-x.y". This attempts to use the correct tag on the custom scorecard image.

    Signed-off-by: Tesshu Flower [email protected]

    Describe what this PR does

    Is there anything that requires special attention?

    Related issues:

    lgtm approved size/S 
    opened by tesshuflower 13
  • Use nodeSelector rather than nodeName to not bypass scheduler

    Use nodeSelector rather than nodeName to not bypass scheduler

    Signed-off-by: Tesshu Flower [email protected]

    Describe what this PR does Stops setting NodeName in the mover job spec and instead uses NodeSelector. It seems that specifying NodeName directly bypasses the scheduler which means that if we have a PVC (like the restic cache pvc) that is in pending (because the storageclass has volumeBindingMode: WaitForFirstConsumer) - then the PVC will be stuck waiting for first consumer, and at the same time the mover pod is stuck waiting for the cache PVC to be Bound. It seems using NodeSelector does go through the scheduler and everything starts as it should.

    Is there anything that requires special attention? We are using the common node label kubernetes.io/hostname for the NodeSelector.

    Related issues: https://github.com/backube/volsync/issues/361#issuecomment-1211065869

    lgtm approved size/M 
    opened by tesshuflower 13
  • Syncthing: Fix CI e2e

    Syncthing: Fix CI e2e

    Describe what this PR does Ensures that the config.xml file is readable in the mover image.

    This is to fix the following (from container logs):

    $ kubectl -n kuttl-test-brave-sheepdog logs pod/volsync-syncthing-1-755d844cdb-7trw5
    ===== STARTING CONTAINER =====
    ===== VolSync Syncthing container version: v0.5.0+ed1e00f =====
    ===== run =====
    ===== Running preflight check =====
    ===== ensuring necessary variables are defined =====
    ===== populating /config with /config.xml =====
    cp: cannot open '/config.xml' for reading: Permission denied
    

    It was causing the mover to CLBO.

    • ~This is also attempting to fix #309 by failing if the certs can't be copied from the secret~ Turned out to be a CI config problem
    • Attempts to fix #306 by waiting for previous pod to be deleted
    • Enables Syncthing backward compatibility w/ TLS 1.2 because the current FIPS-enabled golang builder doesn't support TLS 1.3

    Is there anything that requires special attention?

    Related issues:

    lgtm approved size/M 
    opened by JohnStrunk 13
  • e2e: Fix Rclone privileged test

    e2e: Fix Rclone privileged test

    Describe what this PR does The rclone tests (privileged and unprivileged) were using the same s3 path, causing a race. This was only noticeable for the priv test because that one was preserving the UID of the files.

    Also:

    • Dumps logs from the "helper tools" roles pods.
    • Backs out ignoring the mismatches of UIDs (#494)

    Is there anything that requires special attention?

    Related issues: Reverts #494

    lgtm approved size/L 
    opened by JohnStrunk 12
  • Update to golang 1.18

    Update to golang 1.18

    Describe what this PR does Updates the version of golang to 1.18

    Is there anything that requires special attention? We have been stuck on golang 1.16 due to ubi8/go-toolset not supporting anything more recent. Given that the kube ecosystem is now requiring 1.18 and there doesn't seem to be any movement in go-toolset, this PR moves us back to the official golang builder images. The unfortunate part is that we can't build w/ the hooks for goboring to make sure we're still ok w/ fips in the product builds. I see no reasonable way to get both. :man_shrugging:

    There's also a type assertion fix in here, which is really a fix for my earlier "fix" that I believe made the test non-useful.

    golangci-lint had to be upgraded to get support for go 1.18.

    Related issues: Should unblock #147

    lgtm approved size/M 
    opened by JohnStrunk 12
  • Volume populator for ReplicationDestination

    Volume populator for ReplicationDestination

    Describe the feature you'd like to have. It should be possible to use the ReplicationDestination object as the dataSourceRef of a PVC to enable easier promotion of the latest replicated image.

    What is the value to the end user? (why is it a priority?) Today, users must manually copy the .status.latestImage information into the PVC in order to promote a volume. In addition to being not user friendly, this has an inherent race condition where the replication cycle may replace the Snapshot before it can be properly restored. By using the Data Populator feature, the RD can just be use directly, and it can be left to VolSync to handle the timing issues related to Snapshot promotion.

    How will we know we have a good solution? (acceptance criteria)

    • Placing a ReplicationDestination reference into the dataSourceRef field of a new PVC will cause the PVC to be provisioned with the contents of the latest replicated data.
    • VolSync would be responsible for handling the race between new replications replacing the Snapshot and the Snapshot's restoration into the PVC.

    Additional context

    enhancement 
    opened by JohnStrunk 0
  • Return status info from movers

    Return status info from movers

    Describe the feature you'd like to have. Today, there is relatively little information about what is happening during the sync process. We publish events for things like snapshots/clones, but once the Job starts, there is no available information until it completes. I'd like to be able to return status updates from the data movers so that the information can be exposed to end users without requiring them to kubectl logs <volsync-mover-job>.

    What is the value to the end user? (why is it a priority?) It would be good for the mover jobs to be able to expose status information so that users can more easily see what the current status of synchronization is. Is it syncing? Is it failing to connect?

    How will we know we have a good solution? (acceptance criteria)

    • Movers can send short "status updates" to the controller
    • The controller can publish the status updates to the associated CR's status field (probably via the Synchronizing condition)

    Additional context Updates we might want to publish:

    • (Un)successful connection to a remote (i.e., is the network working)
    • Transfer rates and/or progress information (x% complete; MM:SS remaining)
    • Amount of data transferred
    • Relevant timestamps w/ the above

    Potential methods for returning information:

    • Specially formatted log lines that could be scraped by the controller
    • http endpoint that could be read by the operator
    enhancement 
    opened by JohnStrunk 0
  • Upgrade base image to ubi9

    Upgrade base image to ubi9

    Describe the feature you'd like to have. Our current container images are based on ubi8 (RHEL8). We should move these to ubi9.

    What is the value to the end user? (why is it a priority?)

    How will we know we have a good solution? (acceptance criteria)

    Additional context

    • Combine with #448
    enhancement 
    opened by JohnStrunk 0
  • build(deps): bump github.com/operator-framework/api from 0.17.1 to 0.17.2 in /custom-scorecard-tests

    build(deps): bump github.com/operator-framework/api from 0.17.1 to 0.17.2 in /custom-scorecard-tests

    Bumps github.com/operator-framework/api from 0.17.1 to 0.17.2.

    Release notes

    Sourced from github.com/operator-framework/api's releases.

    v0.17.2

    What's Changed

    Full Changelog: https://github.com/operator-framework/api/compare/v0.17.0...v0.17.2

    Commits
    • 028731a adding grokspawn to tide owners file (#268)
    • b611f6c update k8s 1.25 validation logic (#270)
    • b527a19 (makefile) Upgrade mikefarah/yq to v4 (#267)
    • 4d4ed5a crds,Makefile: Bump controller-tools version to v0.9.0 (#263)
    • e4d13db go.*,pkg: Remove the duplicate github.com/blang/semver dependency (#264)
    • 72295ed Makefile: Remove the -v go mod tidy flag to fix the verify check (#262)
    • ff2dbc5 bump k8s to 1.25 and go to 1.19 (#260)
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    lgtm approved size/XL dependencies go 
    opened by dependabot[bot] 4
  • Need PVC name as a label in each volsync metrics

    Need PVC name as a label in each volsync metrics

    Describe the feature you'd like to have. Need to add the PVC name as a new label on each Volsync metric link . Disaster recovery monitoring dashboard and alerting are at the PVC level. Once the PVC is DR-protected it will start monitoring the replication health. For that, it needs a PVC name in Volsync metrics to figureout the exact replication health.

    What is the value to the end user? (why is it a priority?) Using DR monitoring the user can identify the replication health and if anything goes wrong then the user will be notified via alerts.

    How will we know we have a good solution? (acceptance criteria)

    Additional context

    image

    enhancement help wanted 
    opened by GowthamShanmugam 0
Releases(v0.5.0)
  • v0.5.0(Sep 15, 2022)

    Added

    • New data mover based on Syncthing for live data synchronization.
    • Users can manually label destination Snapshot objects with volsync.backube/do-not-delete to prevent VolSync from deleting them. This provides a way for users to avoid having a Snapshot deleted while they are trying to use it. Users are then responsible for deleting the Snapshot.
    • Publish Kubernetes Events to help troubleshooting

    Changed

    • Operator-SDK upgraded to 1.22.0
    • Rclone upgraded to 1.59.0
    • Restic upgraded to 0.13.1
    • Syncthing upgraded to 1.20.1

    Fixed

    • Fix to RoleBinding created by VolSync for OCP namespace labeler.
    • Fix to helm charts to remove hardcoded overwriting of pod security settings.
    • Fix for node affinity (when using ReplicationSource in Direct mode) to use NodeSelector.
    • Fixed log timestamps to be more readable.
    • CLI: Fixed bug where previously specified options couldn't be removed from relationship file
    • Fixed issue where a snapshot or clone created from a source PVC could request an incorrect size if the PVC capacity did not match the requested size.

    Security

    • kube-rbac-proxy upgraded to 0.13.0

    Removed

    • "Reconciled" condition removed from ReplicationSource and ReplicationDestination .status.conditions[] in favor of returning errors via the "Synchronizing" Condition.
    Source code(tar.gz)
    Source code(zip)
    kubectl-volsync.tar.gz(18.73 MB)
  • v0.4.0(May 12, 2022)

    Added

    • Helm: Add ability to specify container images by SHA hash
    • Started work on new CLI (kubectl plugin)
    • Support FIPS mode on OpenShift
    • Added additional field LastSyncStartTime to CRD status

    Changed

    • Rename CopyMethod None to Direct to make it more descriptive.
    • Upgrade OperatorSDK to 1.15
    • Move Rclone and Rsync movers to the Mover interface
    • Switch snapshot API version from snapshot.storage.k8s.io/v1beta1 to snapshot.storage.k8s.io/v1 so that VolSync remains compatible w/ Kubernetes 1.24+
    • Minimum Kubernetes version is now 1.20 due to the switch to snapshot.storage.k8s.io/v1

    Fixed

    • Resources weren't always removed after each sync iteration
    Source code(tar.gz)
    Source code(zip)
    kubectl-volsync.tar.gz(17.10 MB)
  • v0.3.0(Aug 5, 2021)

    Added

    • Introduced internal "Mover" interface to make adding/maintaining data movers more modular
    • Added a Condition on the CRs to indicate whether they are synchronizing or idle.
    • Rclone: Added unit tests

    Changed

    • Renamed the project: Scribe :arrow_forward: VolSync
    • CRD group has changed from scribe.backube to volsync.backube
    • CRD status Conditions changed from operator-lib to the implementation in apimachinery

    Fixed

    • Restic: Fixed error when the volume is empty
    Source code(tar.gz)
    Source code(zip)
Owner
Backube
Data protection for Kubernetes
Backube
A Kubernetes CSI plugin to automatically mount SPIFFE certificates to Pods using ephemeral volumes

csi-driver-spiffe csi-driver-spiffe is a Container Storage Interface (CSI) driver plugin for Kubernetes to work along cert-manager. This CSI driver tr

null 40 Dec 1, 2022
Kubegres is a Kubernetes operator allowing to create a cluster of PostgreSql instances and manage databases replication, failover and backup.

Kubegres is a Kubernetes operator allowing to deploy a cluster of PostgreSql pods with data replication enabled out-of-the box. It brings simplicity w

Reactive Tech Ltd 1.1k Dec 2, 2022
Litestream-read-replica-demo - A demo application for running live read replication on fly.io with Litestream

Litestream Read Replica Demo A demo application for running live read replicatio

Ben Johnson 68 Oct 18, 2022
An example of using Litestream's live read replication feature.

Litestream Read Replica Example This repository is an example of how to setup and deploy a multi-node SQLite database using Litestream's live read rep

Ben Johnson 46 Nov 26, 2022
Kubernetes OS Server - Kubernetes Extension API server exposing OS configuration like sysctl via Kubernetes API

KOSS is a Extension API Server which exposes OS properties and functionality using Kubernetes API, so it can be accessed using e.g. kubectl. At the moment this is highly experimental and only managing sysctl is supported. To make things actually usable, you must run KOSS binary as root on the machine you will be managing.

Mateusz Gozdek 3 May 19, 2021
Litmus helps Kubernetes SREs and developers practice chaos engineering in a Kubernetes native way.

Litmus Cloud-Native Chaos Engineering Read this in other languages. ???? ???? ???? ???? Overview Litmus is a toolset to do cloud-native chaos engineer

Litmus Chaos 3.4k Nov 27, 2022
KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Kubernetes-based Event Driven Autoscaling KEDA allows for fine-grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KED

KEDA 5.7k Dec 5, 2022
vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds

Website • Quickstart • Documentation • Blog • Twitter • Slack vcluster - Virtual Clusters For Kubernetes Lightweight & Low-Overhead - Based on k3s, bu

Loft Labs 2.2k Dec 5, 2022
network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

kakao 101 Dec 5, 2022
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers

k8s-vault-webhook is a Kubernetes admission webhook which listen for the events related to Kubernetes resources for injecting secret directly from sec

Opstree Container Kit 111 Oct 15, 2022
Carrier is a Kubernetes controller for running and scaling game servers on Kubernetes.

Carrier is a Kubernetes controller for running and scaling game servers on Kubernetes. This project is inspired by agones. Introduction Genera

Open Cloud-native Game-application Initiative 31 Nov 25, 2022
Kubei is a flexible Kubernetes runtime scanner, scanning images of worker and Kubernetes nodes providing accurate vulnerabilities assessment, for more information checkout:

Kubei is a vulnerabilities scanning and CIS Docker benchmark tool that allows users to get an accurate and immediate risk assessment of their kubernet

Portshift 818 Dec 7, 2022
The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Oracle 24 Sep 27, 2022
Kubernetes IN Docker - local clusters for testing Kubernetes

kind is a tool for running local Kubernetes clusters using Docker container "nodes".

Kubernetes SIGs 10.8k Dec 2, 2022
An Easy to use Go framework for Kubernetes based on kubernetes/client-go

k8devel An Easy to use Go framework for Kubernetes based on kubernetes/client-go, see examples dir for a quick start. How to test it ? Download the mo

null 10 Mar 25, 2022
PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes.

GalaxyKube -- PolarDB-X Operator PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes. It follo

null 65 Dec 2, 2022
provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters

provider-kubernetes provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters ty

International Business Machines 2 Jan 5, 2022
Kubernetes Operator to sync secrets between different secret backends and Kubernetes

Vals-Operator Here at Digitalis we love vals, it's a tool we use daily to keep secrets stored securely. We also use secrets-manager on the Kubernetes

digitalis.io 86 Nov 13, 2022
Crossplane provider to provision and manage Kubernetes objects on (remote) Kubernetes clusters.

provider-kubernetes provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters ty

Crossplane Contrib 66 Dec 2, 2022