nano-gpu-agent is a Kubernetes device plugin for GPU resources allocation on node.

Overview

Nano GPU Agent

About this Project

Nano GPU Agent is a Kubernetes device plugin implement for gpu allocation and use in container. It runs as a Daemonset in Kubernetes node. It works as follows:

  • Register gpu core and memory resources on node
  • Allocate and share gpu resources for containers
  • Support gpu resources qos and isolation with specific gpu driver(e.g. qgpu)

For the complete solution and further details, please refer to Nano GPU Scheduler.

Architecture

Issues
  • Get(name string) (*v1.Pod, error)返回不存在

    Get(name string) (*v1.Pod, error)返回不存在

    你好,我在使用0.3.0的agent的时候日志一直打印pod not found,尝试调试了下 func (s podNamespaceLister) Get(name string) (*v1.Pod, error) { obj, exists, err := s.indexer.GetByKey(s.namespace + "/" + name) if err != nil { return nil, err } if !exists { return nil, errors.NewNotFound(v1.Resource("pod"), name) } return obj.(*v1.Pod), nil } 这里返回了nil,然后又看了下cache.indexer是空的,是配置有问题吗? 又试了下GetPodFromApiServer这个方法,pod找到了,但是在annoation的时候又返回nil了。

    log: I1208 08:34:39.101510 11059 server.go:113] PreStartContainerRequest sorted DeviceIDs: [nano-0-00 nano-0-12 nano-0-20 nano-0-29 nano-0-36 nano-0-49 nano-0-52 nano-0-53 nano-0-54 nano-0-61 nano-0-66 nano-0-69 nano-0-74 nano-0-77 nano-0-78 nano-0-79 nano-0-83 nano-0-88 nano-0-91 nano-0-96] I1208 08:34:39.102701 11059 locator.go:79] pod default/cuda-gpu-test-fdd6db75c-55plj lodated with device list [nano-0-83 nano-0-78 nano-0-54 nano-0-66 nano-0-29 nano-0-36 nano-0-61 nano-0-49 nano-0-74 nano-0-88 nano-0-00 nano-0-96 nano-0-12 nano-0-69 nano-0-79 nano-0-91 nano-0-52 nano-0-53 nano-0-77 nano-0-20] 2021-12-08T08:34:50Z debug layer=debugger continuing E1208 08:34:50.637959 11059 server.go:123] get pod default/cuda-gpu-test-fdd6db75c-55plj:cuda failed: pod "cuda-gpu-test-fdd6db75c-55plj" not found

    k8s version:1.21.6

    opened by quietnight 1
  • fix some problem and made a new image of version 0.3.0

    fix some problem and made a new image of version 0.3.0

    1. support running gpu containers with docker
    2. update readme to add step about setting default docker runtime to nvidia
    3. update locator to support k8s 1.21+
    opened by borgerli 0
Owner
Nano GPU
Nano GPU is a GPU framework on Kubernetes for users to use gpu resources right out of the box on Kubernetes.
Nano GPU
The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability

metrics-agent The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability to help you gain vi

null 0 Jan 14, 2022
gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods.

gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods. If you have a GPU machine, and some pods are using the GPU device, you can run the container by docker or kubernetes when your GPU device belongs to nvidia. The gpu-memory-monitor will collect the GPU memory usage of pods, you can get those metrics by API of gpu-memory-monitor

null 2 Jul 27, 2022
OpenAIOS vGPU scheduler for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory.

OpenAIOS vGPU scheduler for Kubernetes English version|中文版 Introduction 4paradigm k8s vGPU scheduler is an "all in one" chart to manage your GPU in k8

4Paradigm 65 Jul 29, 2022
network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

kakao 97 Aug 6, 2022
NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes Table of Contents About Prerequisites Quick Start Preparing your GPU Nodes Enabling GPU Support in Kubernetes Runn

NVIDIA Corporation 1.5k Aug 14, 2022
NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes Table of Contents About Prerequisites Quick Start Preparing your GPU Nodes Enabling GPU Support in Kubernetes Runn

gaoyang 0 Dec 28, 2021
K8s-socketcan - Virtual SocketCAN Kubernetes device plugin

Virtual SocketCAN Kubernetes device plugin This plugins enables you to create vi

Jakub Piotr Cłapa 1 Feb 15, 2022
Igo Agent is the agent of Igo, a command-line tool, through which you can quickly start Igo

igo agent 英文 | 中文 Igo Agent is the agent of Igo, a command-line tool, through which you can quickly start Igo, and other capabilities may be added lat

null 1 Dec 22, 2021
Shoes-agent - Framework for myshoes provider using agent

shoes-agent Framework for myshoes provider using agent. agent: agent for shoes-a

Tachibana waita 2 Jan 8, 2022
Cloudbase Solutions 1 Feb 17, 2022
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)

OmniSSHAgent About The chaotic windows ssh-agent has been integrated into one program. Chaos Map of SSH-Agent on Windows There are several different c

YAMASAKI Masahide 28 Aug 14, 2022
Fadvisor(FinOps Advisor) is a collection of exporters which collect cloud resource pricing and billing data guided by FinOps, insight cost allocation for containers and kubernetes resource

[TOC] Fadvisor: FinOps Advisor fadvisor(finops advisor) is used to solve the FinOps Observalibility, it can be integrated with Crane to help users to

Crane 27 Aug 4, 2022
Golang-for-node-devs - Golang for Node.js developers

Golang for Node.js developers Who is this video for? Familiar with Node.js and i

TomDoesTech 3 Jul 9, 2022
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers

k8s-vault-webhook is a Kubernetes admission webhook which listen for the events related to Kubernetes resources for injecting secret directly from sec

Opstree Container Kit 111 Apr 28, 2022
A plugin for running Open Policy Agent (OPA) in AWS Lambda as a Lambda Extension.

opa-lambda-extension-plugin A custom plugin for running Open Policy Agent (OPA) in AWS Lambda as a Lambda Extension. To learn more about how Lambda Ex

GoDaddy 23 May 10, 2022
gpupod is a tool to list and watch GPU pod in the kubernetes cluster.

gpupod gpupod is simple tool to list and watch GPU pod in kubernetes cluster. usage Usage: gpupod [flags] Flags: -t, --createdTime with pod c

null 0 Dec 8, 2021
kubernetes Display Resource (CPU/Memory/Gpu/PodCount) Usage and Request and Limit.

kubectl resource-view A plugin to access Kubernetes resource requests, limits, and usage. Display Resource (CPU/Memory/Gpu/PodCount) Usage and Request

bryant-rh 8 Apr 22, 2022
Kubectl plugin shows pod x node matrix with suitable colors to mitigate troubleshooting effort.

kubectl-pod-node-matrix WORK IN PROGRESS!! This plugin shows pod x node matrix with suitable colors to mitigate troubleshooting effort. Details Troubl

Arda Güçlü 2 May 11, 2022
k8s applications at my home (on arm64 devices e.g nvidia jet son nano)

k8s applications at my home (on arm64 devices e.g nvidia jet son nano)

Iguchi Tomokatsu 0 Jan 27, 2022