Kueue: Kubernetes-native Job Queueing

Overview

Kueue

Kueue is a set of APIs and controller for job queueing. It is a job-level manager that decides when a job should start (as in pods can be created) and when it should stop (as in active pods should be deleted). The main design principle for Kueue is to avoid duplicating existing functionality: autoscaling, pod-to-node scheduling, job lifecycle management and advanced admission control are the responsibility of core k8s components or commonly accepted frameworks, namely cluster-autoscaler, kube-scheduler and kube-controller-manager and gatekeeper, respectively.

bit.ly/kueue-apis (please join the mailing list to get access) discusses the API proposal and a high-level description of how it operates; while bit.ly/kueue-controller-design presents the detailed design of the controller.

Usage

Requires k8s 1.22 or newer

You can run Kueue with the following command:

IMG=registry.example.com/kueue:latest make docker-build docker-push deploy

The controller will run in the kueue-system namespace. Then, you can and apply some of the samples:

kubectl apply -f config/samples/minimal.yaml
kubectl create -f config/samples/sample-job.yaml

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Comments
  • [Umbrella] ☂️ Requirements for release 0.1.0

    [Umbrella] ☂️ Requirements for release 0.1.0

    Deadline: May 16th Kubecon EU

    Issues that we need to complete to consider kueue ready for a first release:

    • [x] Match workload affinities with flavors #3
    • [x] Single heap per Capacity #87
    • [x] Consistent flavors in a cohort #59
    • [x] Queue status #5
    • [x] Capacity status #7
    • [x] Event for unschedulable workloads #91
    • [x] Capacity namespace selector #4
    • [x] Efficient requeuing #8
    • [x] User guide #64
    • [x] Publish image #52

    Nice to have:

    • [ ] Add borrowing weight #62
    • [ ] E2E test #61
    • [ ] Use kueue.sigs.k8s.io API group #23
    • [ ] Support for one custom job #65
    kind/feature 
    opened by alculquicondor 35
  • Add pending condition to `QueuedWorkload.Status`

    Add pending condition to `QueuedWorkload.Status`

    What type of PR is this?

    /kind feature /kind api-change

    What this PR does / why we need it:

    Which issue(s) this PR fixes:

    Initial pass on #102

    Special notes for your reviewer:

    kind/feature size/L kind/api-change cncf-cla: yes 
    opened by ArangoGutierrez 31
  • Add options for internal cert management to component config

    Add options for internal cert management to component config

    Signed-off-by: tenzen-y [email protected]

    What type of PR is this?

    /kind documentation /kind feature /kind api-change

    What this PR does / why we need it:

    I added options for internal cert management to component config and I have tested the operations in the below commands:

    • For default internal cert management options
    kubectl apply -k github.com/tenzen-y/kueue/config/default?ref=add-cert-options-to-component-config-default-cert-opts
    
    • For customize namespace, serviceName and secretName
    kubectl apply -k github.com/tenzen-y/kueue/config/default?ref=add-cert-options-to-component-config-customize-cert-opts
    

    Also, I fixed Makefile and added a changelog.

    Which issue(s) this PR fixes:

    Fixes #270

    Special notes for your reviewer:

    kind/feature size/L lgtm approved kind/api-change ok-to-test kind/documentation cncf-cla: yes 
    opened by tenzen-y 30
  • Add condition field to ClusterQueueStatus

    Add condition field to ClusterQueueStatus

    Signed-off-by: tenzen-y [email protected]

    What type of PR is this?

    /kind feature /kind api-change

    What this PR does / why we need it:

    I added conditions field to ClusterQueue.

    Which issue(s) this PR fixes:

    Fixes #250

    Special notes for your reviewer:

    ~~API design~~

    Determined API design:

    type ClusterQueueStatus struct {
    	// usedResources are the resources (by flavor) currently in use by the
    	// workloads assigned to this clusterQueue.
    	// +optional
    	UsedResources UsedResources `json:"usedResources"`
    
    	// PendingWorkloads is the number of workloads currently waiting to be
    	// admitted to this clusterQueue.
    	// +optional
    	PendingWorkloads int32 `json:"pendingWorkloads"`
    
    	// AdmittedWorkloads is the number of workloads currently admitted to this
    	// clusterQueue and haven't finished yet.
    	// +optional
    	AdmittedWorkloads int32 `json:"admittedWorkloads"`
    
    	// conditions hold the latest available observations of the ClusterQueue
    	// current state.
    	// +optional
    	// +listType=map
    	// +listMapKey=type
    	Conditions []metav1.Condition `json:"conditions,omitempty"`
    }
    
    kind/feature lgtm size/XL approved kind/api-change cncf-cla: yes 
    opened by tenzen-y 27
  • Fix the command for cloud build

    Fix the command for cloud build

    Signed-off-by: tenzen-y [email protected]

    What type of PR is this?

    /kind bug

    What this PR does / why we need it:

    As discussed in this, I fixed the command for cloud build.

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

    kind/bug lgtm tide/merge-method-squash approved ok-to-test cncf-cla: yes size/S 
    opened by tenzen-y 27
  • Need to improve the readability of the log

    Need to improve the readability of the log

    1.6451684909657109e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
    1.6451684909663508e+09	INFO	setup	starting manager
    1.6451684909665146e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
    1.645168490966593e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
    I0218 07:14:51.066639       1 leaderelection.go:248] attempting to acquire leader lease kueue-system/c1f6bfd2.gke-internal.googlesource.com...
    I0218 07:15:07.705977       1 leaderelection.go:258] successfully acquired lease kueue-system/c1f6bfd2.gke-internal.googlesource.com
    1.6451685077060497e+09	DEBUG	events	Normal	{"object": {"kind":"ConfigMap","namespace":"kueue-system","name":"c1f6bfd2.gke-internal.googlesource.com","uid":"e70e4b9b-54f4-4782-a904-e57d3001c8e6","apiVersion":"v1","resourceVersion":"264201"}, "reason": "LeaderElection", "message": "kueue-controller-manager-7ff7b759bf-nszmb_05445f7f-a871-4a4c-83c1-af075b850e49 became leader"}
    1.6451685077061899e+09	DEBUG	events	Normal	{"object": {"kind":"Lease","namespace":"kueue-system","name":"c1f6bfd2.gke-internal.googlesource.com","uid":"72b48bf0-20e0-42a4-823b-2a6edcb3288a","apiVersion":"coordination.k8s.io/v1","resourceVersion":"264202"}, "reason": "LeaderElection", "message": "kueue-controller-manager-7ff7b759bf-nszmb_05445f7f-a871-4a4c-83c1-af075b850e49 became leader"}
    1.6451685077062488e+09	INFO	controller.queue	Starting EventSource	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Queue", "source": "kind source: *v1alpha1.Queue"}
    1.645168507706281e+09	INFO	controller.queue	Starting Controller	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Queue"}
    1.6451685077062566e+09	INFO	controller.queuedworkload	Starting EventSource	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "QueuedWorkload", "source": "kind source: *v1alpha1.QueuedWorkload"}
    1.6451685077063015e+09	INFO	controller.queuedworkload	Starting Controller	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "QueuedWorkload"}
    1.6451685077062776e+09	INFO	controller.capacity	Starting EventSource	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Capacity", "source": "kind source: *v1alpha1.Capacity"}
    1.6451685077063189e+09	INFO	controller.capacity	Starting Controller	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Capacity"}
    1.6451685077064047e+09	INFO	controller.job	Starting EventSource	{"reconciler group": "batch", "reconciler kind": "Job", "source": "kind source: *v1.Job"}
    1.6451685077064307e+09	INFO	controller.job	Starting EventSource	{"reconciler group": "batch", "reconciler kind": "Job", "source": "kind source: *v1alpha1.QueuedWorkload"}
    1.6451685077064393e+09	INFO	controller.job	Starting Controller	{"reconciler group": "batch", "reconciler kind": "Job"}
    1.6451685078075259e+09	INFO	controller.queuedworkload	Starting workers	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "QueuedWorkload", "worker count": 1}
    1.6451685078075113e+09	INFO	controller.capacity	Starting workers	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Capacity", "worker count": 1}
    1.645168507807566e+09	INFO	controller.queue	Starting workers	{"reconciler group": "kueue.x-k8s.io", "reconciler kind": "Queue", "worker count": 1}
    1.6451685078076618e+09	INFO	controller.job	Starting workers	{"reconciler group": "batch", "reconciler kind": "Job", "worker count": 1}
    1.645168507807886e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"ingress-nginx-admission-create","namespace":"kube-system"}}
    1.645168507808418e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"ingress-nginx-admission-patch","namespace":"kube-system"}}
    1.6451685078085716e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"kube-eventer-init-v1.6-a92aba6-aliyun","namespace":"kube-system"}}
    1.6451706903900485e+09	LEVEL(-2)	capacity-reconciler	Capacity create event	{"capacity": {"name":"cluster-total"}}
    1.6451706904384277e+09	LEVEL(-2)	queue-reconciler	Queue create event	{"queue": {"name":"main","namespace":"default"}}
    1.6451707150770907e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"sample-job-jjbq2","namespace":"default"}}
    1.6451707150895817e+09	LEVEL(-2)	queued-workload-reconciler	QueuedWorkload create event	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "queue": "main", "status": "pending"}
    1.645170715089716e+09	LEVEL(-2)	scheduler	Workload assumed in the cache	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "capacity": "cluster-total"}
    1.6451707150901928e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"sample-job-jjbq2","namespace":"default"}}
    1.6451707150984285e+09	LEVEL(-2)	scheduler	Successfully assigned capacity and resource flavors to workload	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "capacity": "cluster-total"}
    1.6451707150985863e+09	LEVEL(-2)	queued-workload-reconciler	QueuedWorkload update event	{"queuedWorkload": {"name":"sample-job-jjbq2","namespace":"default"}, "queue": "main", "capacity": "cluster-total", "status": "assigned", "prevStatus": "pending", "prevCapacity": ""}
    1.6451707150986767e+09	LEVEL(-2)	job-reconciler	Job reconcile event	{"job": {"name":"sample-job-jjbq2","namespace":"default"}}
    

    We can chose to switch to klog/v2.

    help wanted priority/important-soon lifecycle/stale priority/backlog kind/cleanup 
    opened by denkensk 26
  • Efficient re-queueing of unschedulable workloads

    Efficient re-queueing of unschedulable workloads

    Currently we relentlessly keep trying to schedule jobs.

    We need to do something similar to what we did in the scheduler: re-queue based on capacity/workload/queue events.

    /kind feature

    kind/feature size/XL priority/critical-urgent 
    opened by ahg-g 25
  • Cleanup: scheduler tests independent of job controller

    Cleanup: scheduler tests independent of job controller

    What type of PR is this? /kind cleanup

    What this PR does / why we need it: Made scheduler integration tests independent of job controller

    Which issue(s) this PR fixes: Fixes #155

    Special notes for your reviewer:

    lgtm size/XL approved kind/cleanup ok-to-test cncf-cla: yes 
    opened by thisisprasad 23
  • Release v0.1.1

    Release v0.1.1

    Release Checklist

    • [x] All OWNERS must LGTM the release proposal
    • [x] Verify that the changelog in this issue is up-to-date
    • [x] For major or minor releases (v$MAJ.$MIN.0), create a new release branch.
      • [x] an OWNER creates a vanilla release branch with git branch release-$MAJ.$MIN main
      • [x] An OWNER pushes the new release branch with git push release-$MAJ.$MIN
    • [x] Update things like README, deployment templates, docs, configuration, test/e2e flags. Submit a PR against the release branch:
    • [x] An OWNER prepares a draft release
      • [x] Write the change log into the draft release.
      • [x] Run make artifacts IMAGE_REGISTRY=registry.k8s.io/kueue GIT_TAG=$VERSION to generate the artifacts and upload the files in the artifacts folder to the draft release.
    • [x] An OWNER creates a signed tag running git tag -s $VERSION and inserts the changelog into the tag description. To perform this step, you need a PGP key registered on github.
    • [x] An OWNER pushes the tag with git push $VERSION
      • Triggers prow to build and publish a staging container image gcr.io/k8s-staging-kueue/kueue:$VERSION
    • [x] Submit a PR against k8s.io, updating k8s.gcr.io/images/k8s-staging-kueue/images.yaml to promote the container images to production:
    • [x] Wait for the PR to be merged and verify that the image registry.k8s.io/kueue/kueue:$VERSION is available.
    • [x] Publish the draft release prepared at the Github releases page.
    • [x] Add a link to the tagged release in this issue: https://github.com/kubernetes-sigs/kueue/releases/tag/v0.1.1
    • [x] Send an announcement email to [email protected] and [email protected] with the subject [ANNOUNCE] kueue $VERSION is released
    • [x] Add a link to the release announcement in this issue: https://groups.google.com/a/kubernetes.io/g/wg-batch/c/7Ayju9Lfg2s
    • [x] For a major or minor release, update README.md and docs/setup/install.md in main branch:
    • [x] For a major or minor release, create an unannotated devel tag in the main branch, on the first commit that gets merged after the release branch has been created (presumably the README update commit above), and, push the tag: DEVEL=v0.$(($MAJ+1)).0-devel; git tag $DEVEL main && git push $DEVEL This ensures that the devel builds on the main branch will have a meaningful version number.
    • [x] Close this issue

    Changelog

    • Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
    • Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be retried after a transient error.
    • Fixed requeuing an out-of-date workload when failed to admit it.
    • Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads were not removed from the ClusterQueue when removing the corresponding Queue.
    opened by alculquicondor 23
  • Bump GO to 1.18

    Bump GO to 1.18

    Signed-off-by: Carlos Eduardo Arango Gutierrez [email protected]

    What type of PR is this?

    What this PR does / why we need it:

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

    lgtm approved cncf-cla: yes size/S 
    opened by ArangoGutierrez 23
  • Add PriorityClass in Workload api

    Add PriorityClass in Workload api

    What type of PR is this?

    /kind feature

    What this PR does / why we need it:

    Which issue(s) this PR fixes:

    The first part of #82

    kind/feature size/L lgtm approved cncf-cla: yes 
    opened by denkensk 23
  • custom workload

    custom workload

    Hi, we are looking for exactly such solution for managing and utilize large scale cluster with "batch" workload. As i understand (and correct me if I'm wrong ) kueue still not fully integrated with kubeflow(training operator)/MpiJob. what I'm try to understand first is:

    • if there is support of kueue with kubeflow(training operator)/MpiJob
    • if not, what can we do from our side (not from kubeflow(training operator)/MpiJob ) to make it work. to make kueue work with those CRD.
    • we have resources name habana.ai/ . as I saw you have supports for any compute resource name right ?
    • we have the option to configure "kueue" in some way to control on the priority of the queue or the order inside the queue ?

    we are working with PodGroup so what really interesting us is "queue control" step before the workload get into the scheduler ( I think that exactly what your doing here). my concern is how to do the integration with other CRD and how can I "extend" our reqirments into kueue. for example in k8 scheduler we extended preFilter/Filter and score with plugins that related to our clusters requirement wondering how it will work here

    kind/support 
    opened by talcoh2x 0
  • WIP: Enforce timeout for podsReady

    WIP: Enforce timeout for podsReady

    What type of PR is this?

    /kind feature

    What this PR does / why we need it:

    Which issue(s) this PR fixes:

    Part of: https://github.com/kubernetes-sigs/kueue/issues/349

    Special notes for your reviewer:

    • refactored a little to use a fake clock for the integration tests, this could be decoupled to a PR
    • PodsReady condition is now added independently from configuration, which might be an omission from the previous PR, to be discussed. For now, in this PR, I add the PodsReady condition always (but it is not used to block admission or enforce timeout).
    kind/feature size/L ok-to-test do-not-merge/work-in-progress cncf-cla: yes 
    opened by mimowo 3
  • Send GenericEvents to the clusterqueue-controller only when ResourceFlavors are updated

    Send GenericEvents to the clusterqueue-controller only when ResourceFlavors are updated

    What would you like to be added: Follow up https://github.com/kubernetes-sigs/kueue/pull/415#discussion_r1055748611.

    Why is this needed:

    Completion requirements:

    This enhancement requires the following artifacts:

    • [ ] Design doc
    • [ ] API change
    • [ ] Docs update

    The artifacts should be linked in subsequent comments.

    kind/feature 
    opened by tenzen-y 0
  • Use a fake clock to avoid sleep in the podsready integration tests

    Use a fake clock to avoid sleep in the podsready integration tests

    In the podsready integration test we sleep for a second to ensure the two created workloads have different creation timestamps: https://github.com/kubernetes-sigs/kueue/blob/e688dccea0c3683a35bd51e9d67dba40a3997d83/test/integration/scheduler/podsready/scheduler_test.go#L131. This can be avoided if the test could use a fake clock. This will require refactoring of some controllers to enable injecting the fake clock.

    opened by mimowo 0
  • Use SSA (Server-Side Apply) for setting the workload conditions

    Use SSA (Server-Side Apply) for setting the workload conditions

    With the use of SSA we will be able to avoid conflicts when different controllers set or add the conditions. For example, when a workload is admitted, the WorkloadAdmitted condition is set in a separate request to setting the workloads .spec.admission field. On the other hand, the PodsReady condition is added by the job controller, which may result in a conflict and a need to retry the request. This issue is exposed by the need to retry in the integration test (without the retry the test occasionally fails): https://github.com/kubernetes-sigs/kueue/blob/e688dccea0c3683a35bd51e9d67dba40a3997d83/test/integration/scheduler/podsready/scheduler_test.go#L100. We can avoid the conflicts by using SSA with a pair of dedicated field managers - one for PodsReady (job controller) and WorkloadAdmitted (scheduler and workload controller).

    opened by mimowo 3
  • Update production readiness items

    Update production readiness items

    Signed-off-by: Kante Yin [email protected]

    What type of PR is this?

    /kind cleanup

    What this PR does / why we need it:

    The first step of scalability tests is added in https://github.com/kubernetes-sigs/kueue/pull/462

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

    kind/cleanup kind/documentation cncf-cla: yes size/S 
    opened by kerthcet 5
Releases(v0.2.1)
  • v0.2.1(Aug 25, 2022)

    Changes since v0.1.0:

    Features

    • Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported. v1alpha2 includes the following changes:
      • Rename Queue to LocalQueue.
      • Remove ResourceFlavor.labels. Use ResourceFlavor.metadata.labels instead.
    • Add webhooks to validate and to add defaults to all kueue APIs.
    • Add internal cert manager to serve webhooks with TLS.
    • Use finalizers to prevent ClusterQueues and ResourceFlavors in use from being deleted prematurely.
    • Support codependent resources by assigning the same flavor to codependent resources in a pod set.
    • Support pod overhead in Workload pod sets.
    • Set requests to limits if requests are not set in a Workload pod set, matching internal defaulting for k8s Pods.
    • Add prometheus metrics to monitor health of the system and the status of ClusterQueues.
    • Use Server Side Apply for Workload admission to reduce API conflicts.

    Bug fixes

    • Fix bug that caused Workloads that don't match the ClusterQueue's namespaceSelector to block other Workloads in StrictFIFO ClusterQueues.
    • Fix the number of pending workloads in BestEffortFIFO ClusterQueues status.
    • Fix a bug in BestEffortFIFO ClusterQueues where a workload might not be retried after a transient error.
    • Fix requeuing an out-of-date workload when failed to admit it.
    • Fix a bug in BestEffortFIFO ClusterQueues where inadmissible workloads were not removed from the ClusterQueue when removing the corresponding Queue.

    Thanks to all our contributors!

    In no particular order: @ahg-g @alculquicondor @ArangoGutierrez @cmssczy @denkensk @kerthcet @knight42 @cortespao @shuheiktgw @thisisprasad

    Full Changelog: https://github.com/kubernetes-sigs/kueue/compare/v0.1.0...v0.2.1

    Source code(tar.gz)
    Source code(zip)
    manifests.yaml(543.58 KB)
    prometheus.yaml(1.21 KB)
  • v0.2.0(Aug 25, 2022)

  • v0.1.1(Jun 13, 2022)

    Changes since v0.1.0:

    • Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
    • Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be retried after a transient error.
    • Fixed requeuing an out-of-date workload when failed to admit it.
    • Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads were not removed from the ClusterQueue when removing the corresponding Queue.
    Source code(tar.gz)
    Source code(zip)
    manifests.yaml(528.07 KB)
  • v0.1.0(Apr 12, 2022)

    First release of Kueue, a Kubernetes native set of APIs and controllers for job queueing.

    The release includes:

    • The API group kueue.x-k8s.io/v1alpha1 that includes the ClusterQueue, Queue, ResourceFlavor, and Workload APIs.
    • A set of controllers that supports quota-based job queuing, with:
      • Resource sharing: you can define unused resources that can be borrowed by other tenants.
      • Resource flavors and fungibility: you can define multiple flavors or variants of a resource. Jobs are assigned to flavors that are still available.
      • Two queueing strategies: StrictFIFO and BestEffortFIFO.
    • Support for the Kubernetes batch/v1.Job API.
    • The Workload API abstraction allows you to integrate a third-party job API with Kueue.
    • Documentation available at https://sigs.k8s.io/kueue/docs

    Thanks to all our contributors!

    In no particular order: @alculquicondor @ahg-g @denkensk @ArangoGutierrez @kerthcet @cortespao @BinacsLee @jiwq @Huang-Wei

    Source code(tar.gz)
    Source code(zip)
    manifests.yaml(528.07 KB)
Owner
Kubernetes SIGs
Org for Kubernetes SIG-related work
Kubernetes SIGs
The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Oracle 24 Sep 27, 2022
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers

Developer-oriented Continuous Delivery Product ⁣ English | 简体中文 Table of Contents Zadig Table of Contents What is Zadig Quick start How to use? How to

null 0 Oct 19, 2021
Kubernetes OS Server - Kubernetes Extension API server exposing OS configuration like sysctl via Kubernetes API

KOSS is a Extension API Server which exposes OS properties and functionality using Kubernetes API, so it can be accessed using e.g. kubectl. At the moment this is highly experimental and only managing sysctl is supported. To make things actually usable, you must run KOSS binary as root on the machine you will be managing.

Mateusz Gozdek 3 May 19, 2021
Modern Job Scheduler

Kala Kala is a simplistic, modern, and performant job scheduler written in Go. Features: Single binary No dependencies JSON over HTTP API Job Stats Co

AJ Bahnken 1.9k Dec 21, 2022
Kubernetes Operator for a Cloud-Native OpenVPN Deployment.

Meerkat is a Kubernetes Operator that facilitates the deployment of OpenVPN in a Kubernetes cluster. By leveraging Hashicorp Vault, Meerkat securely manages the underlying PKI.

Oliver Borchert 32 Jan 4, 2023
Kubernetes Native Policy Management

Kyverno Kubernetes Native Policy Management Kyverno is a policy engine designed for Kubernetes. It can validate, mutate, and generate configurations u

Kyverno 3.3k Jan 2, 2023
Kubernetes Native Serverless Framework

kubeless is a Kubernetes-native serverless framework that lets you deploy small bits of code without having to worry about the underlying infrastructu

Kubeless 6.9k Dec 25, 2022
Cloud Native Configurations for Kubernetes

CNCK CNCK = Cloud Native Configurations for Kubernetes Make your Kubernetes applications more cloud native by injecting runtime cluster information in

Tal Liron 5 Nov 4, 2021
gokp aims to install a GitOps Native Kubernetes Platform

gokp gokp aims to install a GitOps Native Kubernetes Platform. This project is a Proof of Concept centered around getting a GitOps aware Kubernetes Pl

Christian Hernandez 25 Nov 4, 2022
OpenYurt - Extending your native Kubernetes to edge(project under CNCF)

openyurtio/openyurt English | 简体中文 What is NEW! Latest Release: September 26th, 2021. OpenYurt v0.5.0. Please check the CHANGELOG for details. First R

OpenYurt 1.4k Jan 7, 2023
Frisbee is a Kubernetes-native platform for exploring, testing, and benchmarking distributed applications.

Why Frisbee ? Frisbee is a next generation platform designed to unify chaos testing and perfomance benchmarking. We address the key pain points develo

Computer Architecture and VLSI Systems (CARV) Laboratory 39 Dec 14, 2022
Kubernetes-native framework for test definition and execution

████████ ███████ ███████ ████████ ██ ██ ██ ██ ██████ ███████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ █████

kubeshop 595 Dec 31, 2022
Cloud Native Electronic Trading System built on Kubernetes and Knative Eventing

Ingenium -- Still heavily in prototyping stage -- Ingenium is a cloud native electronic trading system built on top of Kubernetes and Knative Eventing

Mark Winter 6 Aug 29, 2022
Kubernetes-native automatic dashboard for Ingress

ingress-dashboard Automatic dashboard generation for Ingress objects. Features: No JS Supports OIDC (Keycloak, Google, Okta, ...) and Basic authorizat

Aleksandr Baryshnikov 59 Oct 20, 2022
KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Kubernetes-based Event Driven Autoscaling KEDA allows for fine-grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KED

KEDA 5.9k Jan 7, 2023
vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds

Website • Quickstart • Documentation • Blog • Twitter • Slack vcluster - Virtual Clusters For Kubernetes Lightweight & Low-Overhead - Based on k3s, bu

Loft Labs 2.3k Jan 4, 2023
network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

kakao 102 Dec 18, 2022
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers

k8s-vault-webhook is a Kubernetes admission webhook which listen for the events related to Kubernetes resources for injecting secret directly from sec

Opstree Container Kit 111 Oct 15, 2022
Carrier is a Kubernetes controller for running and scaling game servers on Kubernetes.

Carrier is a Kubernetes controller for running and scaling game servers on Kubernetes. This project is inspired by agones. Introduction Genera

Open Cloud-native Game-application Initiative 31 Nov 25, 2022