MIG Partition Editor for NVIDIA GPUs

Overview

MIG Partiton Editor for NVIDIA GPUs

MIG (short for Multi-Instance GPU) is a mode of operation in the newest generation of NVIDIA Ampere GPUs. It allows one to partition a GPU into a set of "MIG Devices", each of which appears to the software consuming them as a mini-GPU with a fixed partition of memory and a fixed partition of compute resources. Please refer to the MIG User Guide for a detailed explanation of MIG and the features it provides.

The MIG Partiton Editor (nvidia-mig-parted) is a tool designed for system administrators to make working with MIG partitions easier.

It allows administrators to declaratively define a set of possible MIG configurations they would like applied to all GPUs on a node. At runtime, they then point nvidia-mig-parted at one of these configurations, and nvidia-mig-parted takes care of applying it. In this way, the same configuration file can be spread across all nodes in a cluster, and a runtime flag (or environment variable) can be used to decide which of these configurations to actually apply to a node at any given time.

As an example, consider the following configuration for an NVIDIA DGX-A100 node (found in the examples/config.yaml file of this repo):

version: v1
mig-configs:
  all-disabled:
    - devices: all
      mig-enabled: false

  all-enabled:
    - devices: all
      mig-enabled: true
      mig-devices: {}

  all-1g.5gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        "1g.5gb": 7

  all-2g.10gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        "2g.10gb": 3

  all-3g.20gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        "3g.20gb": 2

  all-balanced:
    - devices: all
      mig-enabled: true
      mig-devices:
        "1g.5gb": 2
        "2g.10gb": 1
        "3g.20gb": 1

  custom-config:
    - devices: [0,1,2,3]
      mig-enabled: false
    - devices: [4]
      mig-enabled: true
      mig-devices:
        "1g.5gb": 7
    - devices: [5]
      mig-enabled: true
      mig-devices:
        "2g.10gb": 3
    - devices: [6]
      mig-enabled: true
      mig-devices:
        "3g.20gb": 2
    - devices: [7]
      mig-enabled: true
      mig-devices:
        "1g.5gb": 2
        "2g.10gb": 1
        "3g.20gb": 1

Each of the sections under mig-configs is user-defined, with custom labels used to refer to them. For example, the all-disabled label refers to the MIG configuration that disables MIG for all GPUs on the node. Likewise, the all-1g.5gb label refers to the MIG configuration that slices all GPUs on the node into 1g.5gb devices. Finally, the custom-config label defines a completely custom configuration which disables MIG on the first 4 GPUs on the node, and applies a mix of MIG devices across the rest.

Using this tool the following commands can be run to apply each of these configs, in turn:

$ nvidia-mig-parted apply -f examples/config.yaml -c all-disabled
$ nvidia-mig-parted apply -f examples/config.yaml -c all-1g.5gb
$ nvidia-mig-parted apply -f examples/config.yaml -c all-2g.10gb
$ nvidia-mig-parted apply -f examples/config.yaml -c all-3g.20gb
$ nvidia-mig-parted apply -f examples/config.yaml -c all-balanced
$ nvidia-mig-parted apply -f examples/config.yaml -c custom-config

The currently applied configuration can then be looked up with:

$ nvidia-mig-parted export
version: v1
mig-configs:
  current:
  - devices: all
    mig-enabled: true
    mig-devices:
      1g.5gb: 2
      2g.10gb: 1
      3g.20gb: 1

And asserted with:

$ nvidia-mig-parted assert -f examples/config.yaml -c all-balanced
Selected MIG configuration currently applied

$ echo $?
0

$ nvidia-mig-parted assert -f examples/config.yaml -c all-1g.5gb
ERRO[0000] Assertion failure: selected configuration not currently applied

$ echo $?
1

Note: The nvidia-mig-parted tool alone does not take care of making sure that your node is in a state where MIG mode changes and MIG device configurations will apply cleanly. Moreover, it does not ensure that MIG device configurations will persist across node reboots.

To help with this, a systemd service and a set of support scripts have been developed to wrap nvidia-mig-parted and provide these much desired features. Please see the README.md under deployments/systemd for more details.

Installing nvidia-mig-parted

At the moment, there is no common distribution platform for nvidia-mig-parted, and the only way to get it is to build it from source. Below are some common methods.

Use docker with go get and go install:

docker run \
    -v $(pwd):/dest \
    golang:1.15 \
    sh -c "
    GO111MODULE=off go get -u github.com/NVIDIA/mig-parted/cmd/nvidia-mig-parted
    GOBIN=/dest     go install github.com/NVIDIA/mig-parted/cmd/nvidia-mig-parted
    "

Run go get and go install directly:

GO111MODULE=off go get -u github.com/NVIDIA/mig-parted/cmd/nvidia-mig-parted
GOBIN=$(pwd)    go install github.com/NVIDIA/mig-parted/cmd/nvidia-mig-parted

Clone the repo and build it:

git clone http://github.com/NVIDIA/mig-parted
cd mig-parted
go build ./cmd/nvidia-mig-parted

When followed exactly, any of these methods should generate a binary called nvidia-mig-parted in your current directory. Once this is done, it is advised that you move this binary to somewhere in your path so you can follow the commands below verbatim.

Quick Start

Before going into the details of every possible option for nvidia-mig-parted it's useful to walk through a few examples of its most common usage. All commands below use the example configuration file found under examples/config.yaml of this repo.

Apply a specific MIG config from a configuration file

nvidia-mig-parted apply -f examples/config.yaml -c all-1g.5gb

Apply a config to only change the MIG mode settings of a config

nvidia-mig-parted apply --mode-only -f examples/config.yaml -c all-1g.5gb

Apply a MIG config with debug output

nvidia-mig-parted -d apply -f examples/config.yaml -c all-1g.5gb

Apply a one-off MIG config without a configuration file

cat <<EOF | nvidia-mig-parted apply -f -
version: v1
mig-configs:
  all-1g.5gb:
  - devices: all
    mig-enabled: true
    mig-devices:
      1g.5gb: 7
EOF

Apply a one-off MIG config to only change the MIG mode

cat <<EOF | nvidia-mig-parted apply --mode-only -f -
version: v1
mig-configs:
  whatever:
  - devices: all
    mig-enabled: true
    mig-devices: {}
EOF

Export the current MIG config

nvidia-mig-parted export

Assert a specific MIG configuration is currently applied

nvidia-mig-parted assert -f examples/config.yaml -c all-1g.5gb

Assert the MIG mode settings of a MIG configuration are currently applied

nvidia-mig-parted assert --mode-only -f examples/config.yaml -c all-1g.5gb

Assert a one-off MIG config without a configuration file

cat <<EOF | nvidia-mig-parted assert -f -
version: v1
mig-configs:
  all-1g.5gb:
  - devices: all
    mig-enabled: true
    mig-devices: 
      1g.5gb: 7
EOF

Assert the MIG mode setting of a one-off MIG config

cat <<EOF | nvidia-mig-parted assert --mode-only -f -
version: v1
mig-configs:
  whatever:
  - devices: all
    mig-enabled: true
    mig-devices: {}
EOF
Issues
  • mmap error for most operations on debian 10

    mmap error for most operations on debian 10

    Hi,

    I tried to use mig-parted on a debian 10 system with 6 A100 GPUs installed and backports kernel 5.10 (5.10.0-0.bpo.8-amd64 #1 SMP Debian 5.10.46-4~bpo10+1 (2021-08-07) x86_64 GNU/Linux) as well as kernel 4.19 and get the following error:

    # nvidia-mig-parted -d assert --config-file /etc/nvidia-mig-manager/config.yaml --selected-config all-disabled
    DEBU[0000] Parsing config file...                       
    DEBU[0000] Selecting specific MIG config...             
    DEBU[0000] Asserting MIG mode configuration...          
    DEBU[0000] Walking MigConfig for (devices=all)          
    DEBU[0000]   GPU 0: 0x20F110DE                          
    DEBU[0000]     Asserting MIG mode: Disabled             
    DEBU[0000] Error checking MIG capable: error opening bar0 MMIO resource: failed to open file for mmio: failed to mmap file: invalid argument
     
    FATA[0000] Assertion failure: selected configuration not currently applied
    
    # nvidia-mig-parted -d apply --config-file /etc/nvidia-mig-manager/config.yaml --selected-config all-disabled 
    [...]
    DEBU[0001] Applying MIG mode change...                  
    DEBU[0001] Walking MigConfig for (devices=all)          
    DEBU[0001]   GPU 0: 0x20F110DE                          
    DEBU[0001]     MIG capable: true                        
    DEBU[0001]     Current MIG mode: Disabled               
    DEBU[0001]     Updating MIG mode: Disabled              
    DEBU[0001]     Mode change pending: false               
    DEBU[0001]   GPU 1: 0x20F110DE                          
    DEBU[0001]     MIG capable: true                        
    DEBU[0001]     Current MIG mode: Disabled               
    DEBU[0001]     Updating MIG mode: Disabled              
    DEBU[0001]     Mode change pending: false               
    DEBU[0001]   GPU 2: 0x20F110DE                          
    DEBU[0001]     MIG capable: true                        
    DEBU[0001]     Current MIG mode: Disabled               
    DEBU[0001]     Updating MIG mode: Disabled              
    DEBU[0001]     Mode change pending: false               
    DEBU[0001]   GPU 3: 0x20F110DE                          
    DEBU[0001]     MIG capable: true                        
    DEBU[0001]     Current MIG mode: Disabled               
    DEBU[0001]     Updating MIG mode: Disabled              
    DEBU[0001]     Mode change pending: false               
    DEBU[0001]   GPU 4: 0x20F110DE                          
    DEBU[0001]     MIG capable: true                        
    DEBU[0001]     Current MIG mode: Enabled                
    DEBU[0001]     Updating MIG mode: Disabled              
    DEBU[0003]     Mode change pending: false               
    DEBU[0003]   GPU 5: 0x20F110DE                          
    DEBU[0003]     MIG capable: true                        
    DEBU[0003]     Current MIG mode: Enabled                
    DEBU[0003]     Updating MIG mode: Disabled              
    DEBU[0014]     Mode change pending: false               
    DEBU[0014] Checking current MIG device configuration... 
    DEBU[0014] Walking MigConfig for (devices=all)          
    DEBU[0014]   GPU 0: 0x20F110DE                          
    DEBU[0014] Running pre-apply-config hook
    [...]
    DEBU[0014] Applying MIG device configuration...         
    DEBU[0014] Walking MigConfig for (devices=all)          
    DEBU[0014]   GPU 0: 0x20F110DE                          
    DEBU[0014] Running apply-exit hook
    [....]
    FATA[0015] Error checking MIG capable: error opening bar0 MMIO resource: failed to open file for mmio: failed to mmap file: invalid argument
    

    The change for mig mode (equivalent to nvidia-smi -mig 0) works fine but the assertion always fails with this error and setting up mig instances doesn't work. I tried to debug the mmap failure but couldn't find anything obvious. I also used strace to how the call:

    strace -e trace=%memory nvidia-mig-parted -d assert --config-file /etc/nvidia-mig-manager/vrvis.yaml --selected-config all-disabled
    brk(NULL)                               = 0x1255000
    mmap(NULL, 25926, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f904a11f000
    mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f904a11d000
    mmap(NULL, 132288, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f904a0fc000
    mmap(0x7f904a102000, 61440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f904a102000
    mmap(0x7f904a111000, 24576, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f904a111000
    mmap(0x7f904a117000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a000) = 0x7f904a117000
    mmap(0x7f904a119000, 13504, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f904a119000
    mmap(NULL, 16656, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f904a0f7000
    mmap(0x7f904a0f8000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f904a0f8000
    mmap(0x7f904a0f9000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f904a0f9000
    mmap(0x7f904a0fa000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f904a0fa000
    mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9049f36000
    mprotect(0x7f9049f58000, 1658880, PROT_NONE) = 0
    mmap(0x7f9049f58000, 1343488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f9049f58000
    mmap(0x7f904a0a0000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7f904a0a0000
    mmap(0x7f904a0ed000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7f904a0ed000
    mmap(0x7f904a0f3000, 14336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f904a0f3000
    mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9049f33000
    mprotect(0x7f904a0ed000, 16384, PROT_READ) = 0
    mprotect(0x7f904a0fa000, 4096, PROT_READ) = 0
    mprotect(0x7f904a117000, 4096, PROT_READ) = 0
    mprotect(0x878000, 4096, PROT_READ)     = 0
    mprotect(0x7f904a14d000, 4096, PROT_READ) = 0
    munmap(0x7f904a11f000, 25926)           = 0
    brk(NULL)                               = 0x1255000
    brk(0x1276000)                          = 0x1276000
    mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9049ef3000
    mmap(NULL, 131072, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9049ed3000
    mmap(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9049dd3000
    mmap(NULL, 8388608, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90495d3000
    mmap(NULL, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90455d3000
    mmap(NULL, 536870912, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90255d3000
    mmap(0xc000000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc000000000
    mmap(0xc000000000, 67108864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc000000000
    mmap(NULL, 33554432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90235d3000
    mmap(NULL, 2165768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90233c2000
    mmap(0x7f9049ed3000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9049ed3000
    mmap(0x7f9049e53000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9049e53000
    mmap(0x7f90499d9000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f90499d9000
    mmap(0x7f9047603000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9047603000
    mmap(0x7f9035753000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9035753000
    mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90232c2000
    mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90232b2000
    mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90232a2000
    mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9022aa1000
    mprotect(0x7f9022aa2000, 8388608, PROT_READ|PROT_WRITE) = 0
    mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f90222a0000
    mprotect(0x7f90222a1000, 8388608, PROT_READ|PROT_WRITE) = 0
    mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9021a9f000
    mprotect(0x7f9021aa0000, 8388608, PROT_READ|PROT_WRITE) = 0
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=6578, si_uid=0} ---
    mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9020a5d000
    mprotect(0x7f9020a5e000, 8388608, PROT_READ|PROT_WRITE) = 0
    mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9020a1d000
    DEBU[0000] Parsing config file...                       
    DEBU[0000] Selecting specific MIG config...             
    DEBU[0000] Asserting MIG mode configuration...          
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=6578, si_uid=0} ---
    mmap(NULL, 1439992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90208bd000
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=6578, si_uid=0} ---
    DEBU[0000] Walking MigConfig for (devices=all)          
    DEBU[0000]   GPU 0: 0x20F110DE                          
    DEBU[0000]     Asserting MIG mode: Disabled             
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=6578, si_uid=0} ---
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=6578, si_uid=0} ---
    mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f902086d000
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=6578, si_uid=0} ---
    --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=6578, si_uid=0} ---
    mmap(NULL, 16777216, PROT_READ, MAP_SHARED, 3, 0) = -1 EINVAL (Invalid argument)
    DEBU[0000] Error checking MIG capable: error opening bar0 MMIO resource: failed to open file for mmio: failed to mmap file: invalid argument
     
    FATA[0000] Assertion failure: selected configuration not currently applied 
    +++ exited with 1 +++
    

    Thanks for your help, Valentin

    opened by vali-um 10
  • Propagate INSTANCE_PROFILE_6_SLICE constants from NVML

    Propagate INSTANCE_PROFILE_6_SLICE constants from NVML

    NVML added GPU_INSTANCE_PROFILE_6_SLICE and COMPUTE_INSTANCE_PROFILE_6_SLICE constants, but these weren't propagated through mig-parted, meaning that 6-slice MIG profiles were not supported. This PR adds those constants to internal/nvml/consts.go and exposes them in pkg/types/mig_profile.go.

    opened by vineel-cruise 4
  • Keep original kubernetes components labels when reconfiguring

    Keep original kubernetes components labels when reconfiguring

    In the kubernetes Daemonset:

    if the original kubernetes labels on the node are: nvidia.com/gpu.deploy.dcgm-exporter: "true" nvidia.com/gpu.deploy.device-plugin: "false" nvidia.com/gpu.deploy.gpu-feature-discovery: "true"

    after the reconfiguration they are all changed to 'true' although 'nvidia.com/gpu.deploy.device-plugin' is set to false

    In this PR: Keep the labels original value, and assign them after reconfiguration

    Signed-off-by: omer-dayan [email protected]

    opened by omer-dayan 4
  • Installing `nvidia-mig-parted` fails.

    Installing `nvidia-mig-parted` fails.

    Dear all,

    thank you for your work! We used the following installation instructions to build nvidia-mig-parted:

    docker run \
        -v $(pwd):/dest \
        golang:1.15 \
        sh -c "
        GO111MODULE=off go get -u github.com/NVIDIA/mig-parted/cmd/nvidia-mig-parted
        GOBIN=/dest     go install github.com/NVIDIA/mig-parted/cmd/nvidia-mig-parted
        "
    

    I think that exactly this statement worked fine a few weeks ago, but currently it fails with:

    src/github.com/NVIDIA/mig-parted/cmd/util/util.go:100:18: undefined: os.ReadFile
    

    Is this a known problem or did I made some mistake? (I tested it on multiple locations, the error was always the os.ReadFile one from above.)

    Best regards,

    Maik

    opened by mam10eks 2
  • A start job for Configure MIG on NVIDIA GPUs (x min x sec / no limit)

    A start job for Configure MIG on NVIDIA GPUs (x min x sec / no limit)

    Hi. i deployed nvidia-mig-manager.service - everything is ok! But after the restart I see the message: "A start job for Configure MIG on NVIDIA GPUs (x min x sec / no limit)" - is this normal? How can I get rid of this? nvidia

    opened by dogtown63 2
  • Not to delete mig instance if 'CLIENT_IN_USE' when the permutation needs it anyway

    Not to delete mig instance if 'CLIENT_IN_USE' when the permutation needs it anyway

    Not to delete mig instance if 'CLIENT_IN_USE' when the permutation needs to create it anyway

    Example: Current config: mig-devices: "1g.5gb": 2 "2g.10gb": 1 "3g.20gb": 1 (A process runs on a single '1g.5gb')

    Desired config: mig-devices: "1g.5gb": 7

    What would now happen is a failure because when the mig-parted try to delete all the instances of the current-config it gets 'CLIENT_IN_USE' because of the running process on the '1g.5gb'.

    However, even if it delete it successfully it would then re-create if because of the desired-config

    In this PR: If you get CLIENT_IN_USE when trying to delete mig-instance - then check if you would anyway try to create it. If true: dont delete this instance - and dont try to create it

    Signed-off-by: omer-dayan [email protected]

    opened by omer-dayan 1
  • node should reboot once MIG config is enabled or disabled via node label

    node should reboot once MIG config is enabled or disabled via node label

    once MIG is enabled or disabled, nvidia-smi command needs a node reboot to show the correct status. Also, MIG profiles start/stop working properly only after a node reboot..

    opened by dogra-gopal 8
  • How to access the a MIG Device ID programmatically

    How to access the a MIG Device ID programmatically

    Hi @klueska , I am looking into an issue of assigning a gpu which has been partitioned by MIG inside a python script where want to run a Pytorch model.

    We typically do it this way in Torchserve and now if a A100 gpu is partitioned into 2 gpus such as "MIG-GPU-63feeb45-94c6-b9cb-78ea-98e9b7a5be6b/0/0" and "MIG-GPU-63feeb45-94c6-b9cb-78ea-98e9b7a5be6b/1/0", what would be good way to handle it, is there any tool available that provides this info?

    This MIG GPU-id is not available through CUDA utilitiesin Pytorch.

    I appreciate your thoughts.

    opened by HamidShojanazeri 9
  • "7g.40gb" configuration missing in examples/config.yaml

    I think this config is missing in examples/config.yaml, to be exhaustive with the list of configurations compatible with the single strategy

          all-7g.40gb:
            - devices: all
              mig-enabled: true
              mig-devices:
                "7g.40gb": 1
    
    opened by kpouget 2
  • Missing feature: show the spec of a MIG profile

    Missing feature: show the spec of a MIG profile

    It would be nice if we could the tool to query content of a configuration file (most important is the 2nd command):

    # nvidia-mig-parted show -f examples/config.yaml
    version: v1
    mig-configs:
      - all-disabled
      - all-enabled
      -  ...
    
    # nvidia-mig-parted show -f examples/config.yaml -c all-balanced
    version: v1
    all-balanced:
      - devices: all
        mig-enabled: true
        mig-devices:
          "1g.5gb": 2
          "2g.10gb": 1
          "3g.20gb": 1
    

    along with nvidia-mig-parted export, this would allow a script to detect if applying a profile would change the state of mig-enabled property

    opened by kpouget 8
Releases(v0.4.2)
Owner
NVIDIA Corporation
NVIDIA Corporation
Nvidia GPU exporter for prometheus using nvidia-smi binary

nvidia_gpu_exporter Nvidia GPU exporter for prometheus, using nvidia-smi binary to gather metrics. Introduction There are many Nvidia GPU exporters ou

Utku Özdemir 106 Jun 28, 2022
Gokrazy mkfs: a program to create an ext4 file system on the gokrazy perm partition

gokrazy mkfs This program is intended to be run on gokrazy only, where it will c

null 4 Jun 13, 2022
2D triangulation library. Allows translating lines and polygons (both based on points) to the language of GPUs.

triangolatte 2D triangulation library. Allows translating lines and polygons (both based on points) to the language of GPUs. Features normal and miter

Tomasz Czajęcki 26 Mar 5, 2022
Mini is a small text editor, inspred by antirez's kilo editor.

mini Mini is a small text editor, inspred by antirez's kilo editor. It aims to Keep it simple, stupid.

Ken Hibino 72 Jun 22, 2022
NERV Editor - A simple but peculiar text editor

nerved a simple but peculiar text editor introduction nerved is a text editor bu

kiasaki 5 Apr 12, 2022
Live on-demand transcoding in go using ffmpeg. Also with NVIDIA GPU hardware acceleration.

Go live HTTP on-demand transcoding Transcoding is expensive and resource consuming operation on CPU and GPU. For big companies with thousands of custo

Miroslav Šedivý 90 Jun 16, 2022
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

DCGM-Exporter This repository contains the DCGM-Exporter project. It exposes GPU metrics exporter for Prometheus leveraging NVIDIA DCGM. Documentation

NVIDIA Corporation 149 Jun 28, 2022
NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes Table of Contents About Prerequisites Quick Start Preparing your GPU Nodes Enabling GPU Support in Kubernetes Runn

NVIDIA Corporation 1.4k Jun 22, 2022
NVIDIA container runtime

nvidia-container-runtime A modified version of runc adding a custom pre-start hook to all containers. If environment variable NVIDIA_VISIBLE_DEVICES i

NVIDIA Corporation 888 Jul 1, 2022
NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes Table of Contents About Prerequisites Quick Start Preparing your GPU Nodes Enabling GPU Support in Kubernetes Runn

gaoyang 0 Dec 28, 2021
k8s applications at my home (on arm64 devices e.g nvidia jet son nano)

k8s applications at my home (on arm64 devices e.g nvidia jet son nano)

Iguchi Tomokatsu 0 Jan 27, 2022
Pure Go line editor with history, inspired by linenoise

Liner Liner is a command line editor with history. It was inspired by linenoise; everything Unix-like is a VT100 (or is trying very hard to be). If yo

Peter Harris 902 Jun 30, 2022
A graphical text editor

A A is a text editor inspired by the Sam and Acme text editors for the Plan 9 operating system. binary Latest Binaries https://github.com/as/a/release

null 307 Jun 20, 2022
A Programmer's Text Editor

The de Editor de is a programmer's editor. (Where that programmer happens to be driusan.) It's kind of like a bastard child of vim and Plan 9's acme e

Dave MacFarlane 396 Jun 15, 2022
A very religious text editor

--== Godit - a very religious text editor ==-- Screenshots: * https://nosmileface.dev/images/godit-linux1.png * https://nosmileface.dev/images/god

null 541 May 24, 2022
A modern and intuitive terminal-based text editor

micro is a terminal-based text editor that aims to be easy to use and intuitive, while also taking advantage of the capabilities of modern terminals.

Zachary Yedidia 19.7k Jun 24, 2022
Source code editor written in Go using go-gtk bindings. It aims to handle navigation effectively among large number of files.

tabby Source code editor written in Go using go-gtk bindings. It aims to handle navigation effectively among large number of files. screenshot: depend

Mikhail Trushnikov 49 Dec 30, 2021
An Enhanced Go Experience For The Atom Editor

go-plus An Improved Go Experience For The Atom Editor Github: https://github.com/joefitzgerald/go-plus Atom: https://atom.io/packages/go-plus Overview

Joe Fitzgerald 1.5k Jun 18, 2022
A modern and intuitive terminal-based text editor

micro is a terminal-based text editor that aims to be easy to use and intuitive, while also taking advantage of the capabilities of modern terminals

Zachary Yedidia 19.7k Jun 26, 2022
Go version of Plan9 Acme Editor

Overview Go port of Rob Pike's Acme editor. Derived from ProjectSerenity but now increasingly divergent. ProjectSerenity was itself a transliteration

Robert Kroeger 320 Jun 21, 2022
A modern UNIX ed (line editor) clone written in Go

ed (the awesome UNIX line editor) ed is a clone of the UNIX command-line tool by the same name ed a line editor that was nortorious for being and most

James Mills 45 May 29, 2021
A modern and intuitive terminal-based text editor

micro is a terminal-based text editor that aims to be easy to use and intuitive, while also taking advantage of the capabilities of modern terminals.

Zachary Yedidia 19.7k Jun 26, 2022
tson is JSON viewer and editor written in Go

tson tson is JSON viewer and editor written in Go. This tool displays JSON as a tree and you can search and edit key or values. Support OS Mac Linux I

skanehira 127 Mar 17, 2022
Source code editor in pure Go.

Editor Source code editor in pure Go. About This is a simple but advanced source code editor As the editor is being developed, the rules of how the UI

Jorge Miguel Pinto 248 Jun 15, 2022
Binary editor written in Go

bed Binary editor written in Go Screenshot Why? Why not? Programming is so fun! I learned so much while creating this editor; handling of file pointer

itchyny 1.1k Jun 29, 2022
Integrated console application library, using Go structs as commands, with menus, completions, hints, history, Vim mode, $EDITOR usage, and more ...

Gonsole - Integrated Console Application library This package rests on a readline console library, (giving advanced completion, hint, input and histor

null 17 Apr 3, 2022
WYSIWYG theme editor for Fyne

fyne-theme-generator WYSIWYG theme editor for Fyne Installation go get github.com/lusingander/fyne-theme-generator Usage You can change each parameter

Kyosuke Fujimoto 70 Jun 24, 2022
sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP

sops is an editor of encrypted files that supports YAML, JSON, ENV, INI and BINARY formats and encrypts with AWS KMS, GCP KMS, Azure Key Vault, age, and PGP. (demo)

Mozilla 10.1k Jun 26, 2022
📝 Easily format yaml files on terminal or your editor

YAMLFMT A simple and extensible yaml formatter. Installation go install github.com/UltiRequiem/[email protected] Make sure your $PATH includes the $GOPAT

Eliaz Bobadilla 22 Jun 15, 2022