A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod

Overview

GPU Mounter

GPUMounter License GPUMounter master CI badge GPUMounter worker CI badge

GPU Mounter is a kubernetes plugin which enables add or remove GPU resources for running Pods. This Introduction(In Chinese) is recommended to read which can help you understand what and why is GPU Mounter.

Schematic Diagram Of GPU Dynamic Mount

Features

  • Supports add or remove GPU resources of running Pod without stopping or restarting
  • Compatible with kubernetes scheduler

Prerequisite

  • Kubernetes v1.16.2 / v1.18.6 (other version not tested, v1.13+ is required, v1.15+ is recommended)
  • Docker 19.03/18.09 (other version not tested)
  • Nvidia GPU device plugin
  • nvidia-container-runtime (must be configured as default runtime)

NOTE: If you are using GPU Mounter on Kubernetes v1.13 or v1.14, you need to manually enable the feature KubeletPodResources. It is enabled by default in Kubernetes v1.15+.

Deploy

  • label GPU nodes with gpu-mounter-enable=enable
kubectl label node <nodename> gpu-mounter-enable=enable
  • deploy
chmod u+x deploy.sh
./deploy.sh deploy
  • uninstall
./deploy.sh uninstall

Quick Start

See QuickStart.md

FAQ

See FAQ.md

License

This project is licensed under the Apache-2.0 License.

Issues and Contributing

  • Please let me know by Issues if you experience any problems
  • Pull requests are very welcomed, if you have any ideas to make GPU Mounter better.
Comments
  • gpu-mounter-worker Error: Can not connect to /var/lib/kubelet/pod-resources/kubelet.sock

    gpu-mounter-worker Error: Can not connect to /var/lib/kubelet/pod-resources/kubelet.sock

    k8s version:v1.14 docker version:18.09.5

    测试进群中的kubelet.sock在宿主机的/var/lib/kubelet/device-plugins

    所以修改了gpu-mounter-worker.yaml文件中的挂载位置

         volumes:
            - name: cgroup
              hostPath:
                type: Directory
                path: /sys/fs/cgroup
            - name: device-monitor
              hostPath:
                type: Directory
                #path: /var/lib/kubelet/pod-resources
                path: /var/lib/kubelet/device-plugins
            - name: log-dir
              hostPath:
                type: DirectoryOrCreate
                path: /etc/GPUMounter/log
    

    报错信息如下:

    [[email protected] deploy]# kubectl logs -f gpu-mounter-workers-2wfnp    -n kube-system
    2020-12-20T12:30:27.657Z	INFO	GPUMounter-worker/main.go:15	Service Starting...
    2020-12-20T12:30:27.657Z	INFO	gpu-mount/server.go:21	Creating gpu mounter
    2020-12-20T12:30:27.657Z	INFO	allocator/allocator.go:26	Creating gpu allocator
    2020-12-20T12:30:27.657Z	INFO	collector/collector.go:23	Creating gpu collector
    2020-12-20T12:30:27.657Z	INFO	collector/collector.go:41	Start get gpu info
    2020-12-20T12:30:27.660Z	INFO	collector/collector.go:52	GPU Num: 2
    2020-12-20T12:30:27.674Z	ERROR	collector/collector.go:106	Can not connect to /var/lib/kubelet/pod-resources/kubelet.sock
    2020-12-20T12:30:27.674Z	ERROR	collector/collector.go:107	failure getting pod resources rpc error: code = Unimplemented desc = unknown service v1alpha1.PodResourcesLister
    2020-12-20T12:30:27.674Z	ERROR	collector/collector.go:32	Failed to update gpu status
    2020-12-20T12:30:27.674Z	ERROR	allocator/allocator.go:30	Failed to init gpu collector
    2020-12-20T12:30:27.674Z	ERROR	gpu-mount/server.go:25	Filed to init gpu allocator
    2020-12-20T12:30:27.674Z	ERROR	GPUMounter-worker/main.go:18	Failed to init gpu mounter
    2020-12-20T12:30:27.674Z	ERROR	GPUMounter-worker/main.go:19	failure getting pod resources rpc error: code = Unimplemented desc = unknown service v1alpha1.PodResourcesLister
    
    enhancement good first issue 
    opened by ThinkBlue1991 6
  • More graceful dependency management

    More graceful dependency management

    GPU Mounter depends on KubeletPodResources api to get GPU usage from kubelet.

    https://github.com/pokerfaceSad/GPUMounter/blob/f827c71c3a1c09fd7413b67629ccb6b1ac95f113/pkg/util/gpu/collector/collector.go#L11

    The KubeletPodResources api is import from k8s.io/kubernetes directly. Refer to kubernetes/issues/79384 and go/issues/32776, it is necessary to add require directives for matching versions of all of the subcomponents.

    https://github.com/pokerfaceSad/GPUMounter/blob/f827c71c3a1c09fd7413b67629ccb6b1ac95f113/go.mod#L14-L39

    enhancement 
    opened by pokerfaceSad 5
  • Is it necessary to bind only one GPU to one slave pod?

    Is it necessary to bind only one GPU to one slave pod?

    Hello, I am elihe from Zhihu. I have seen your article in Zhihu before. After reading your code I have a question: Why is each slave pod bound to only one GPU in GetAvailableGPU method of pkg/util/gpu/allocator/allocator.go? As far as I'm concerned, in a large-scale cluster, this will bring additional load to the master node (there will be a larger number of pod creation requests); And the creation of multiple single-card pods may cause two competing GPU mount requests all failing (for example There are 4 available GPUs and two requests to mount 4 cards. One request successfully created slave pods 1 and 2, and the other created slave pods 3 and 4. They will all be unable to obtain more resources.) If you agree with me, can I submit a merge request to optimize this?

    opened by ilyee 3
  • 试用过程中发现的问题

    试用过程中发现的问题

    1.使用readme中的例子(yaml文件如下),会出现将节点调度到非GPU宿主机上

    apiVersion: v1
    kind: Pod
    metadata:
      name: gpu-pod
    spec:
      containers:
        - name: cuda-container
          image: tensorflow/tensorflow:1.13.2-gpu
          command: ["/bin/sh"]
          args: ["-c", "while true; do echo hello; sleep 10;done"]
          env:
           - name: NVIDIA_VISIBLE_DEVICES
             value: "none"
    
    1. 如果镜像中没有mknod执行脚本,会出错。查看源码发现的确需要有mknod执行脚本,这个是不是应该在README中说明
    2. 如果AddGPU失败,新创建的gpu-pod-slave-*POD会一直占用GPU资源(除非将gpu-pod删除),是否应该设置callback?
    bug 
    opened by ThinkBlue1991 3
  • Some TODOs after merge PR #15

    Some TODOs after merge PR #15

    Thanks for @ilyee add gang scheduling support in #15 which means gang scheduler can be selected when we add multi GPUs.

    And still sth. need to fix:

    • Add the relevant docs

    • Add the relevant RESTful API

    • allocator/allocator.go:159 log format error

    • need to input all GPU uuids when unmount after gang mount

    opened by pokerfaceSad 2
  • gpu节点上,gpu-worker报错:nvml error: %+vcould not load NVML library

    gpu节点上,gpu-worker报错:nvml error: %+vcould not load NVML library

    • 在gpu节点上,gpu-worker报错日志信息如下:
    [[email protected] ~]# kubectl  logs -f gpu-mounter-workers-ccqfv  -n kube-system
    2021-02-10T01:01:09.689Z        INFO    GPUMounter-worker/main.go:15    Service Starting...
    2021-02-10T01:01:09.690Z        INFO    gpu-mount/server.go:21  Creating gpu mounter
    2021-02-10T01:01:09.690Z        INFO    allocator/allocator.go:27       Creating gpu allocator
    2021-02-10T01:01:09.690Z        INFO    collector/collector.go:23       Creating gpu collector
    2021-02-10T01:01:09.690Z        INFO    collector/collector.go:41       Start get gpu info
    2021-02-10T01:01:09.690Z        ERROR   collector/collector.go:43       nvml error: %+vcould not load NVML library
    2021-02-10T01:01:09.690Z        ERROR   collector/collector.go:26       Failed to init gpu collector
    2021-02-10T01:01:09.690Z        ERROR   allocator/allocator.go:31       Failed to init gpu collector
    2021-02-10T01:01:09.690Z        ERROR   gpu-mount/server.go:25  Filed to init gpu allocator
    2021-02-10T01:01:09.690Z        ERROR   GPUMounter-worker/main.go:18    Failed to init gpu mounter
    2021-02-10T01:01:09.690Z        ERROR   GPUMounter-worker/main.go:19    could not load NVML library
    
    • gpu-woker的pod调度节点如下:
    [[email protected] ~]# kubectl  get pod  gpu-mounter-workers-ccqfv  -n kube-system -o wide 
    NAME                        READY   STATUS             RESTARTS   AGE   IP           NODE   NOMINATED NODE   READINESS GATES
    gpu-mounter-workers-ccqfv   0/1     ImagePullBackOff   1814       20d   10.42.5.33   t90    <none>           <none>
    
    • gpu节点信息如下
    (base) [email protected]:~# nvidia-smi 
    Tue Feb 23 14:28:51 2021       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla P100-PCIE...  Off  | 00000000:03:00.0 Off |                    0 |
    | N/A   34C    P0    32W / 250W |   9114MiB / 16280MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla P100-PCIE...  Off  | 00000000:04:00.0 Off |                    0 |
    | N/A   34C    P0    31W / 250W |    520MiB / 16280MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0      1024      C   python                                       869MiB |
    |    0      1225      C   /usr/local/bin/python                       8235MiB |
    |    1      1024      C   python                                       255MiB |
    |    1      1225      C   /usr/local/bin/python                        255MiB |
    +-----------------------------------------------------------------------------+
    
    help wanted 
    opened by ThinkBlue1991 2
  • Slave pod creating will failed if the owner pod namespace enabled Resource Quotas

    Slave pod creating will failed if the owner pod namespace enabled Resource Quotas

    Refer to the K8S documentation

    If quota is enabled in a namespace for compute resources like cpu and memory, users must specify requests or limits for those values; otherwise, the quota system may reject pod creation.

    So slave pod creating will be failed if the owner pod namespace enabled resource quotas.

    pods "xxx" is forbidden
    

    And if we create slave pod in the owner pod namespace by set a resource quota, the slave pod need to consume resource quotas. It is unreasonable in a multi-tenant cluster scenario.

    enhancement 
    opened by pokerfaceSad 2
  • support entire mount

    support entire mount

    The entire mount supports using only one slave pod to mount all GPU to:

    1. reduce the load of the cluster
    2. decreases number of pods on single node
    3. avoid deadlock problems caused by multiple single-card slave pods competing for GPU resources
    opened by ilyee 1
  • Wait until the slave pods deletion finished

    Wait until the slave pods deletion finished

    In current version, when calling remove GPU service, it return without waiting for the deletion of slave pods.

    It is unreasonable, because kubelet think the GPU resource is still being occupied by slave pod until the slave pod deletion finshed.

    enhancement 
    opened by pokerfaceSad 1
  • Hope this time would be fine

    Hope this time would be fine

    Since there are some duplicate commands in Dockerfile, I believe it is better to do some modifications. I also upgrade the Ubuntu version since 16.04 is quite old.

    opened by InfluencerNGZK 0
  • Delete useless commands in Dockerfile, upgrade ubuntu version

    Delete useless commands in Dockerfile, upgrade ubuntu version

    Since there are some duplicate commands in Dockerfile, I believe it is better to do some modifications. I also upgrade the Ubuntu version since 16.04 is quite old.

    opened by InfluencerNGZK 0
  • Can not use GPUMounter on k8s

    Can not use GPUMounter on k8s

    environment:

    • k8s 1.16.15
    • docker 20.10.10

    problem: following QuickStart.md, I install GPUMounter successfully in my k8s. However, never request remove gpu and add gpu sucessfully.

    I pasted some logs from gpu-mounter-master-container:

    remove gpu 2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:120 access remove gpu service 2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:134 GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd 2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:135 GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd 2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:146 Pod: jupyter-lab-54d76f5d58-rlklh Namespace: default UUIDs: GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd force: true 2022-02-18T03:44:55.188Z INFO GPUMounter-master/main.go:169 Found Pod: jupyter-lab-54d76f5d58-rlklh in Namespace: default on Node: dev06.ucd.qzm.stonewise.cn 2022-02-18T03:44:55.193Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-fbfj8 Node: dev05.ucd.qzm.stonewise.cn 2022-02-18T03:44:55.193Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-kwmsn Node: dev06.ucd.qzm.stonewise.cn 2022-02-18T03:44:55.201Z ERROR GPUMounter-master/main.go:217 Invalid UUIDs: GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd

    add gpu 2022-02-18T03:42:22.897Z INFO GPUMounter-master/main.go:25 access add gpu service 2022-02-18T03:42:22.898Z INFO GPUMounter-master/main.go:30 Pod: jupyter-lab-54d76f5d58-rlklh Namespace: default GPU Num: 4 Is entire mount: false 2022-02-18T03:42:22.902Z INFO GPUMounter-master/main.go:66 Found Pod: jupyter-lab-54d76f5d58-rlklh in Namespace: default on Node: dev06.ucd.qzm.stonewise.cn 2022-02-18T03:42:22.907Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-fbfj8 Node: dev05.ucd.qzm.stonewise.cn 2022-02-18T03:42:22.907Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-kwmsn Node: dev06.ucd.qzm.stonewise.cn 2022-02-18T03:42:22.921Z ERROR GPUMounter-master/main.go:98 Failed to call add gpu service 2022-02-18T03:42:22.921Z ERROR GPUMounter-master/main.go:99 rpc error: code = Unknown desc = FailedCreated

    opened by Crazybean-lwb 6
  • GPUMounter-worker error in k8s v1.23.1

    GPUMounter-worker error in k8s v1.23.1

    GPUMounter-master.log: 2022-01-16T11:24:14.610Z INFO GPUMounter-master/main.go:25 access add gpu service 2022-01-16T11:24:14.610Z INFO GPUMounter-master/main.go:30 Pod: test Namespace: default GPU Num: 1 Is entire mount: false 2022-01-16T11:24:14.627Z INFO GPUMounter-master/main.go:66 Found Pod: test in Namespace: default on Node: rtxws 2022-01-16T11:24:14.634Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-7dsdf Node: rtxws 2022-01-16T11:24:19.648Z ERROR GPUMounter-master/main.go:98 Failed to call add gpu service 2022-01-16T11:24:19.648Z ERROR GPUMounter-master/main.go:99 rpc error: code = Unknown desc = Service Internal Error


    GPUMounter-worker.log: 2022-01-16T11:24:14.635Z INFO gpu-mount/server.go:35 AddGPU Service Called 2022-01-16T11:24:14.635Z INFO gpu-mount/server.go:36 request: pod_name:"test" namespace:"default" gpu_num:1 2022-01-16T11:24:14.645Z INFO gpu-mount/server.go:55 Successfully get Pod: default in cluster 2022-01-16T11:24:14.645Z INFO allocator/allocator.go:159 Get pod default/test mount type 2022-01-16T11:24:14.645Z INFO collector/collector.go:91 Updating GPU status 2022-01-16T11:24:14.646Z INFO collector/collector.go:136 GPU status update successfully 2022-01-16T11:24:14.657Z INFO allocator/allocator.go:59 Creating GPU Slave Pod: test-slave-pod-2f66ed for Owner Pod: test 2022-01-16T11:24:14.657Z INFO allocator/allocator.go:238 Checking Pods: test-slave-pod-2f66ed state 2022-01-16T11:24:14.661Z INFO allocator/allocator.go:264 Pod: test-slave-pod-2f66ed creating 2022-01-16T11:24:19.442Z INFO allocator/allocator.go:277 Pods: test-slave-pod-2f66ed are running 2022-01-16T11:24:19.442Z INFO allocator/allocator.go:84 Successfully create Slave Pod: %s, for Owner Pod: %s test-slave-pod-2f66edtest 2022-01-16T11:24:19.442Z INFO collector/collector.go:91 Updating GPU status 2022-01-16T11:24:19.444Z DEBUG collector/collector.go:130 GPU: /dev/nvidia0 allocated to Pod: test-slave-pod-2f66ed in Namespace gpu-pool 2022-01-16T11:24:19.444Z INFO collector/collector.go:136 GPU status update successfully 2022-01-16T11:24:19.444Z INFO gpu-mount/server.go:81 Start mounting, Total: 1 Current: 1 2022-01-16T11:24:19.444Z INFO util/util.go:19 Start mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"} to Pod: test 2022-01-16T11:24:19.444Z INFO util/util.go:24 Pod :test container ID: e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3 2022-01-16T11:24:19.444Z INFO util/util.go:30 Successfully get cgroup path: /kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3 for Pod: test 2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:140 Exec "echo 'c 195:0 rw' > /sys/fs/cgroup/devices/kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3/devices.allow" failed 2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:141 Output: sh: 1: cannot create /sys/fs/cgroup/devices/kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3/devices.allow: Directory nonexistent

    2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:142 exit status 2 2022-01-16T11:24:19.445Z ERROR util/util.go:33 Add GPU {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"}failed 2022-01-16T11:24:19.445Z ERROR gpu-mount/server.go:84 Mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"} to Pod: test in Namespace: default failed 2022-01-16T11:24:19.445Z ERROR gpu-mount/server.go:85 exit status 2


    環境與版本

    • k8s version: v1.23
    • docker-client version:19.03.13
    • docekr-server version:20.10.12

    在k8s v1.23裡, "/sys/fs/cgroup/devices/kubepods/burstable/pod[pod-id]/[container-id]/devices.allow" 改為 "/sys/fs/cgroup/devices/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod[pod-id]/docker-[container-id].scope/devices.allow"

    所以當前GPUMounter在v1.23裡無法正常運作

    是否可以更新至可符合k8s v1.23版,謝謝

    bug 
    opened by cool9203 10
  • Slave Pod BestEffort QoS may lead to GPU resource leak

    Slave Pod BestEffort QoS may lead to GPU resource leak

    In current version, slave pod QoS class is BestEffort. The slave pod will most likely down when an eviction occurs. And it will lead to GPU resource leak( the user pod can still use GPU resource but GPU Mounter and kube-scheduler don't know at all).

    enhancement 
    opened by pokerfaceSad 0
Owner
XinYuan
Docker & Kubernetes & GPU
XinYuan
A very simple, silly little kubectl plugin / utility that guesses which language an application running in a kubernetes pod was written in.

A very simple, silly little kubectl plugin / utility that guesses which language an application running in a kubernetes pod was written in.

Tom Granot 2 Mar 9, 2022
gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods.

gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods. If you have a GPU machine, and some pods are using the GPU device, you can run the container by docker or kubernetes when your GPU device belongs to nvidia. The gpu-memory-monitor will collect the GPU memory usage of pods, you can get those metrics by API of gpu-memory-monitor

null 2 Jul 27, 2022
gpupod is a tool to list and watch GPU pod in the kubernetes cluster.

gpupod gpupod is simple tool to list and watch GPU pod in kubernetes cluster. usage Usage: gpupod [flags] Flags: -t, --createdTime with pod c

null 0 Dec 8, 2021
Add, remove, and manage different versions of web-distributed software binaries. No elevated permissions required!

A cross-platform package manager for the web! Add, remove, and manage different versions of web-distributed software binaries. No elevated permissions

Andrew Lee 36 Nov 21, 2022
Wirewold cellular automata simulator, running entirely on GPU.

Wireworld-gpu Wireworld implements the data and rules for the Wireworld cellular automata. This particular version is an experiment whereby the simula

null 0 Nov 26, 2021
provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters

provider-kubernetes provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters ty

International Business Machines 2 Jan 5, 2022
A kubernetes operator sample generated by kubebuilder , which run cmd in pod on specified time

init kubebuilder init --domain github.com --repo github.com/tonyshanc/sample-operator-v2 kubebuilder create api --group sample --version v1 --kind At

shank 0 Jan 25, 2022
Set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs

Caelus Caelus is a set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs, these resources come from the underuti

Tencent 299 Nov 19, 2022
quick debug program running in the k8s pod

quick-debug English | 中文 What Problem To Solve As the k8s becomes more and more popular, most projects are deployed in k8s, and so is the development

Alan Wang 13 Apr 1, 2022
Display (Namespace, Pod, Container, Primary PID) from a host PID, fails if the target process is running on host

Display (Namespace, Pod, Container, Primary PID) from a host PID, fails if the target process is running on host

K8s-school 13 Oct 17, 2022
OpenAIOS vGPU scheduler for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory.

OpenAIOS vGPU scheduler for Kubernetes English version|中文版 Introduction 4paradigm k8s vGPU scheduler is an "all in one" chart to manage your GPU in k8

4Paradigm 123 Nov 23, 2022
kubernetes Display Resource (CPU/Memory/Gpu/PodCount) Usage and Request and Limit.

kubectl resource-view A plugin to access Kubernetes resource requests, limits, and usage. Display Resource (CPU/Memory/Gpu/PodCount) Usage and Request

bryant-rh 8 Apr 22, 2022
🦥 kubectl plugin to easy to view pod

kubectl-lazy Install curl -sSL https://mirror.ghproxy.com/https://raw.githubusercontent.com/togettoyou/kubectl-lazy/main/install.sh | bash Or you can

寻寻觅觅的Gopher 9 Oct 13, 2022
K8s-cinder-csi-plugin - K8s Pod Use Openstack Cinder Volume

k8s-cinder-csi-plugin K8s Pod Use Openstack Cinder Volume openstack volume list

douyali 0 Jul 18, 2022
Kubectl plugin shows pod x node matrix with suitable colors to mitigate troubleshooting effort.

kubectl-pod-node-matrix WORK IN PROGRESS!! This plugin shows pod x node matrix with suitable colors to mitigate troubleshooting effort. Details Troubl

Arda Güçlü 2 May 11, 2022
Dynamically provisioning persistent local storage with Kubernetes

Local Path Provisioner Overview Local Path Provisioner provides a way for the Kubernetes users to utilize the local storage in each node. Based on the

Rancher 1.4k Nov 28, 2022
General Pod Autoscaler(GPA) is a extension for K8s HPA, which can be used not only for serving, also for game.

Introduction General Pod Autoscaler(GPA) is a extension for K8s HPA, which can be used not only for serving, also for game. Features Compatible with a

Open Cloud-native Game-application Initiative 15 Aug 19, 2022
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers

k8s-vault-webhook is a Kubernetes admission webhook which listen for the events related to Kubernetes resources for injecting secret directly from sec

Opstree Container Kit 111 Oct 15, 2022
Translate Prometheus Alerts into Kubernetes pod readiness

prometheus-alert-readiness Translates firing Prometheus alerts into a Kubernetes readiness path. Why? By running this container in a singleton deploym

Coralogix 20 Oct 31, 2022