nano-gpu-agent is a Kubernetes device plugin for GPU resources allocation on node.

Overview

Nano GPU Agent

About this Project

Nano GPU Agent is a Kubernetes device plugin implement for gpu allocation and use in container. It runs as a Daemonset in Kubernetes node. It works as follows:

  • Register gpu core and memory resources on node
  • Allocate and share gpu resources for containers
  • Support gpu resources qos and isolation with specific gpu driver(e.g. qgpu)

For the complete solution and further details, please refer to Nano GPU Scheduler.

Architecture

You might also like...
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)

OmniSSHAgent About The chaotic windows ssh-agent has been integrated into one program. Chaos Map of SSH-Agent on Windows There are several different c

Fadvisor(FinOps Advisor) is a collection of exporters which collect cloud resource pricing and billing data guided by FinOps, insight cost allocation for containers and kubernetes resource
Fadvisor(FinOps Advisor) is a collection of exporters which collect cloud resource pricing and billing data guided by FinOps, insight cost allocation for containers and kubernetes resource

[TOC] Fadvisor: FinOps Advisor fadvisor(finops advisor) is used to solve the FinOps Observalibility, it can be integrated with Crane to help users to

Golang-for-node-devs - Golang for Node.js developers

Golang for Node.js developers Who is this video for? Familiar with Node.js and i

A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers

k8s-vault-webhook is a Kubernetes admission webhook which listen for the events related to Kubernetes resources for injecting secret directly from sec

A plugin for running Open Policy Agent (OPA) in AWS Lambda as a Lambda Extension.

opa-lambda-extension-plugin A custom plugin for running Open Policy Agent (OPA) in AWS Lambda as a Lambda Extension. To learn more about how Lambda Ex

k8s applications at my home (on arm64 devices e.g nvidia jet son nano)

k8s applications at my home (on arm64 devices e.g nvidia jet son nano)

gpupod is a tool to list and watch GPU pod in the kubernetes cluster.

gpupod gpupod is simple tool to list and watch GPU pod in kubernetes cluster. usage Usage: gpupod [flags] Flags: -t, --createdTime with pod c

kubernetes Display Resource (CPU/Memory/Gpu/PodCount) Usage and Request and Limit.
kubernetes Display Resource (CPU/Memory/Gpu/PodCount) Usage and Request and Limit.

kubectl resource-view A plugin to access Kubernetes resource requests, limits, and usage. Display Resource (CPU/Memory/Gpu/PodCount) Usage and Request

Kubectl plugin shows pod x node matrix with suitable colors to mitigate troubleshooting effort.

kubectl-pod-node-matrix WORK IN PROGRESS!! This plugin shows pod x node matrix with suitable colors to mitigate troubleshooting effort. Details Troubl

Comments
  • Get(name string) (*v1.Pod, error)返回不存在

    Get(name string) (*v1.Pod, error)返回不存在

    你好,我在使用0.3.0的agent的时候日志一直打印pod not found,尝试调试了下 func (s podNamespaceLister) Get(name string) (*v1.Pod, error) { obj, exists, err := s.indexer.GetByKey(s.namespace + "/" + name) if err != nil { return nil, err } if !exists { return nil, errors.NewNotFound(v1.Resource("pod"), name) } return obj.(*v1.Pod), nil } 这里返回了nil,然后又看了下cache.indexer是空的,是配置有问题吗? 又试了下GetPodFromApiServer这个方法,pod找到了,但是在annoation的时候又返回nil了。

    log: I1208 08:34:39.101510 11059 server.go:113] PreStartContainerRequest sorted DeviceIDs: [nano-0-00 nano-0-12 nano-0-20 nano-0-29 nano-0-36 nano-0-49 nano-0-52 nano-0-53 nano-0-54 nano-0-61 nano-0-66 nano-0-69 nano-0-74 nano-0-77 nano-0-78 nano-0-79 nano-0-83 nano-0-88 nano-0-91 nano-0-96] I1208 08:34:39.102701 11059 locator.go:79] pod default/cuda-gpu-test-fdd6db75c-55plj lodated with device list [nano-0-83 nano-0-78 nano-0-54 nano-0-66 nano-0-29 nano-0-36 nano-0-61 nano-0-49 nano-0-74 nano-0-88 nano-0-00 nano-0-96 nano-0-12 nano-0-69 nano-0-79 nano-0-91 nano-0-52 nano-0-53 nano-0-77 nano-0-20] 2021-12-08T08:34:50Z debug layer=debugger continuing E1208 08:34:50.637959 11059 server.go:123] get pod default/cuda-gpu-test-fdd6db75c-55plj:cuda failed: pod "cuda-gpu-test-fdd6db75c-55plj" not found

    k8s version:1.21.6

    opened by quietnight 1
  • fix some problem and made a new image of version 0.3.0

    fix some problem and made a new image of version 0.3.0

    1. support running gpu containers with docker
    2. update readme to add step about setting default docker runtime to nvidia
    3. update locator to support k8s 1.21+
    opened by borgerli 0
  • Create container error while use 2 cards

    Create container error while use 2 cards

    Issue

    Deploy pod, if gpu limit set 2 cards, report error; if gpu limit 1 card, succes running

    Resource setting

    elasticgpu.io/gpu-core: "200"

    Error message

    Error: failed to start container xxxxx: Error response from daemon: error gathering device information while adding custiom device ""

    opened by gzchen008 0
Owner
Nano GPU
Nano GPU is a GPU framework on Kubernetes for users to use gpu resources right out of the box on Kubernetes.
Nano GPU
The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability

metrics-agent The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability to help you gain vi

null 0 Jan 14, 2022
OpenAIOS vGPU scheduler for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory.

OpenAIOS vGPU scheduler for Kubernetes English version|中文版 Introduction 4paradigm k8s vGPU scheduler is an "all in one" chart to manage your GPU in k8

4Paradigm 132 Jan 3, 2023
gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods.

gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods. If you have a GPU machine, and some pods are using the GPU device, you can run the container by docker or kubernetes when your GPU device belongs to nvidia. The gpu-memory-monitor will collect the GPU memory usage of pods, you can get those metrics by API of gpu-memory-monitor

null 2 Jul 27, 2022
network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

kakao 102 Dec 18, 2022
NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes Table of Contents About Prerequisites Quick Start Preparing your GPU Nodes Enabling GPU Support in Kubernetes Runn

NVIDIA Corporation 1.6k Dec 31, 2022
NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes Table of Contents About Prerequisites Quick Start Preparing your GPU Nodes Enabling GPU Support in Kubernetes Runn

gaoyang 0 Dec 28, 2021
K8s-socketcan - Virtual SocketCAN Kubernetes device plugin

Virtual SocketCAN Kubernetes device plugin This plugins enables you to create vi

Jakub Piotr Cłapa 1 Feb 15, 2022
Igo Agent is the agent of Igo, a command-line tool, through which you can quickly start Igo

igo agent 英文 | 中文 Igo Agent is the agent of Igo, a command-line tool, through which you can quickly start Igo, and other capabilities may be added lat

null 1 Dec 22, 2021
Shoes-agent - Framework for myshoes provider using agent

shoes-agent Framework for myshoes provider using agent. agent: agent for shoes-a

Tachibana waita 2 Jan 8, 2022
Cloudbase Solutions 1 Feb 17, 2022