Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications

Related tags

DevOps Tools nomad
Overview

Nomad Build Status Discuss

HashiCorp Nomad logo

Nomad is a simple and flexible workload orchestrator to deploy and manage containers (docker, podman), non-containerized applications (executable, Java), and virtual machines (qemu) across on-prem and clouds at scale.

Nomad is supported on Linux, Windows, and macOS. A commercial version of Nomad, Nomad Enterprise, is also available.

Nomad provides several key features:

  • Deploy Containers and Legacy Applications: Nomad’s flexibility as an orchestrator enables an organization to run containers, legacy, and batch applications together on the same infrastructure. Nomad brings core orchestration benefits to legacy applications without needing to containerize via pluggable task drivers.

  • Simple & Reliable: Nomad runs as a single binary and is entirely self contained - combining resource management and scheduling into a single system. Nomad does not require any external services for storage or coordination. Nomad automatically handles application, node, and driver failures. Nomad is distributed and resilient, using leader election and state replication to provide high availability in the event of failures.

  • Device Plugins & GPU Support: Nomad offers built-in support for GPU workloads such as machine learning (ML) and artificial intelligence (AI). Nomad uses device plugins to automatically detect and utilize resources from hardware devices such as GPU, FPGAs, and TPUs.

  • Federation for Multi-Region, Multi-Cloud: Nomad was designed to support infrastructure at a global scale. Nomad supports federation out-of-the-box and can deploy applications across multiple regions and clouds.

  • Proven Scalability: Nomad is optimistically concurrent, which increases throughput and reduces latency for workloads. Nomad has been proven to scale to clusters of 10K+ nodes in real-world production environments.

  • HashiCorp Ecosystem: Nomad integrates seamlessly with Terraform, Consul, Vault for provisioning, service discovery, and secrets management.

Quick Start

Testing

See Learn: Getting Started for instructions on setting up a local Nomad cluster for non-production use.

Optionally, find Terraform manifests for bringing up a development Nomad cluster on a public cloud in the terraform directory.

Production

See Learn: Nomad Reference Architecture for recommended practices and a reference architecture for production deployments.

Documentation

Full, comprehensive documentation is available on the Nomad website: https://www.nomadproject.io/docs

Guides are available on HashiCorp Learn.

Contributing

See the contributing directory for more developer documentation.

Comments
  • Persistent data on nodes

    Persistent data on nodes

    Nomad should have some way for tasks to acquire persistent storage on nodes. In a lot of cases, we might want to run our own hdfs or ceph cluster on nomad.

    That means, things like hdfs' datanodes needs to be able to reserve persistent storage on the node it is launched on. If the whole cluster goes down, once its brought back up, the appropriate tasks should be launched on its original nodes (where possible), so that it can gain access to data it has previously written.

    type/enhancement theme/scheduling 
    opened by F21 117
  • Specify logging driver and options for docker driver

    Specify logging driver and options for docker driver

    Please correct me if I am wrong, but I couldn't find in the documentation how to pass the log-driver and log-opt arguments to containers when running them as Nomad tasks, e.g.: --log-driver=awslogs --log-opt awslogs-region=us-east-1 --log-opt awslogs-group=myLogGroup --log-opt awslogs-stream=myLogStream

    I know I can configure the docker daemon with these arguments, but then I can't specify different logstreams for each container. If this is currently not possible, I would like to request it as a feature. Thank you

    type/enhancement theme/driver/docker 
    opened by pajel 77
  • Constraint

    Constraint "CSI volume has exhausted its available writer claims": 1 nodes excluded by filter

    Nomad version

    Nomad v1.1.2 (60638a086ef9630e2a9ba1e237e8426192a44244)

    Operating system and Environment details

    Ubuntu 20.04 LTS

    Issue

    Cannot re-plan jobs due to CSI volumes being claimed. I have seen many variations about this issue. I don't know how to debug it. I use ceph-csi plugin to deploy system job on my two Nomad nodes. This result in two controllers and two ceph-csi nodes. I then create a few volumes using nomad volume create command. I then create a job with three tasks that use three volumes. Sometimes, after a while the job may fail, and I stop it. After that when I try to replan the exact same job I get that error.

    What confuses me is the warning. It differs every time I run job plan. First I saw

    - WARNING: Failed to place all allocations.
      Task Group "zookeeper1" (failed to place 1 allocation):
        * Constraint "CSI volume zookeeper1-data has exhausted its available writer claims": 2 nodes excluded by filter
    
      Task Group "zookeeper2" (failed to place 1 allocation):
        * Constraint "CSI volume zookeeper2-data has exhausted its available writer claims": 2 nodes excluded by filter
    

    Then, runnig job plan again a few seconds after, I got

    - WARNING: Failed to place all allocations.
      Task Group "zookeeper1" (failed to place 1 allocation):
        * Constraint "CSI volume zookeeper1-datalog has exhausted its available writer claims": 2 nodes excluded by filter
    
      Task Group "zookeeper2" (failed to place 1 allocation):
        * Constraint "CSI volume zookeeper2-datalog has exhausted its available writer claims": 2 nodes excluded by filter
    

    Then again,

    - WARNING: Failed to place all allocations.
      Task Group "zookeeper1" (failed to place 1 allocation):
        * Constraint "CSI volume zookeeper1-data has exhausted its available writer claims": 1 nodes excluded by filter
        * Constraint "CSI volume zookeeper1-datalog has exhausted its available writer claims": 1 nodes excluded by filter
    
      Task Group "zookeeper2" (failed to place 1 allocation):
        * Constraint "CSI volume zookeeper2-datalog has exhausted its available writer claims": 2 nodes excluded by filter
    

    I have three groups: zookeeper1, zookeeper2, and zookeeper3, each using two volumes (data and datalog). I will just assume from this log that all volumes are non-reclaimable.

    This is the output of nomad volume status.

    Container Storage Interface
    ID                           Name                Plugin ID  Schedulable  Access Mode
    zookeeper1-data     zookeeper1-data     ceph-csi   true         single-node-writer
    zookeeper1-datalog  zookeeper1-datalog  ceph-csi   true         single-node-writer
    zookeeper2-data     zookeeper2-data     ceph-csi   true         single-node-writer
    zookeeper2-datalog  zookeeper2-datalog  ceph-csi   true         single-node-writer
    zookeeper3-data     zookeeper3-data     ceph-csi   true         <none>
    zookeeper3-datalog  zookeeper3-datalog  ceph-csi   true         <none>
    

    It says that they are schedulable. This is the output of nomad volume status zookeeper1-datalog:

    ID                   = zookeeper1-datalog
    Name                 = zookeeper1-datalog
    External ID          = 0001-0024-72f28a72-0434-4045-be3a-b5165287253f-0000000000000003-72ec315b-e9f5-11eb-8af7-0242ac110002
    Plugin ID            = ceph-csi
    Provider             = cephfs.nomad.example.com
    Version              = v3.3.1
    Schedulable          = true
    Controllers Healthy  = 2
    Controllers Expected = 2
    Nodes Healthy        = 2
    Nodes Expected       = 2
    Access Mode          = single-node-writer
    Attachment Mode      = file-system
    Mount Options        = <none>
    Namespace            = default
    
    Allocations
    No allocations placed
    

    It says there, there are no allocations placed.

    Reproduction steps

    This is unfortunately flaky. But most likely happen due to job failing and then stopped and then replanned. This persists even after I purge the job with nomad job stop -purge. No, doing nomad system gc, nomad system reconcile summary, or restarting Nomad does not work.

    Expected Result

    Should be able to reclaim the volume again without having to detach or deregister -force and register again. I created the volumes using nomad volume create so those volumes have their external IDs all generated. There are 6 volumes and 2 nodes, I don't want to type detach 12 times everytime this happens (this happens so frequently).

    Actual Result

    See error logs above.

    Job file (if appropriate)

    I have three groups (zookeeper1, zookeeper2, zookeeper3) each having volume stanza like this (each with their own volumes, this one is for zookeeper2):

        volume "data" {
          type = "csi"
          read_only = false
          source = "zookeeper2-data"
          attachment_mode = "file-system"
          access_mode     = "single-node-writer"
    
          mount_options {
            fs_type     = "ext4"
            mount_flags = ["noatime"]
          }
        }
        volume "datalog" {
          type = "csi"
          read_only = false
          source = "zookeeper2-datalog"
          attachment_mode = "file-system"
          access_mode     = "single-node-writer"
    
          mount_options {
            fs_type     = "ext4"
            mount_flags = ["noatime"]
          }
        }
    

    All groups have count = 1.

    type/bug theme/storage stage/accepted 
    opened by gregory112 74
  • Ability to select private/public IP for specific task/port

    Ability to select private/public IP for specific task/port

    Extracted from #209.

    We use Nomad with Docker driver to operate cluster of machines. Some of them have both public and private interfaces. These two-NIC machines run internal services that need to listen only on a private interface, as well as public services, which should listen on a public interface.

    So we need a way of specifying whether some task should listen on public or private IP.

    I think this can be generalized to the ability to specify subnet mask for a specific port:

    resources {
        network {
            mbits = 100
            port "http" {
                # Listen on all interfaces that match this mask; the task will not be
                # started on a machine that has no NICs with IPs in this subnet.
                netmask = "10.10.0.1/16"
            }
            port "internal-bus" {
                # The same with static port number
                static = 4050
                netmask = "127.0.0.1/32"
            }
        }
    }
    

    This would be the most flexible solution that would cover most, if not all, cases. For example, to listen on all interfaces, as requested in #209, you would just pass 0.0.0.0/0 netmask that matches all possible IPs. Maybe it makes sense to make this netmask the default, i.e. bind to all interfaces if no netmask is specified for a port.

    I think this is really important feature, because its lack prevents people from running Nomad in VPC (virtual private cloud) environments, like Amazon VPC, Google Cloud Platform with subnetworks, OVH Dedicated Cloud and many others, as well as any other environments where some machines are connected to more than one network.


    Another solution is to allow specifying interface name(s), like eth0, but that wouldn't work in our case because:

    1. different machines may have different order and, thus, different names of network interfaces;
    2. to make things worse, some machines may have multiple IPs assigned to the same interface, e.g. see DigitalOcean's anchor ip which is enabled by default on each new machine.

    Example for point 1: assume that I want to start some task on all machines in the cluster, and that I want this task to listen only on private interface to prevent exposing it to the outer world. Consul agent is a nice example of such service.

    Now, some machines in the cluster are connected to both public and private networks, and have two NICs:

    • eth0 corresponds to public network, say, 162.243.197.49/24;
    • eth1 corresponds to my private network 10.10.0.1/24.

    But majority of machines are only connected to a private net, and have only one NIC:

    • eth0 corresponds to the private net 10.10.0.1/24.

    This is fairly typical setup in VPC environments.

    You can see that it would be impossible to constrain my service only to private subnet by specifying interface name, because eth0 corresponds to different networks on different machines, and eth1 is even missing on some machines.

    type/enhancement theme/networking stage/thinking 
    opened by skozin 73
  • Provide for dependencies between tasks in a group

    Provide for dependencies between tasks in a group

    Tasks in a group sometimes need to be ordered to start up correctly.

    For example, to support the Ambassador pattern, proxy containers (P[n]) used for outbound request routing by a dependent application may be started only after the dependent application (A) is started. This is because Docker needs to know the name of A to configure shared-container networking when launching P[n].

    In the first approximation of the solution, ordering can be simple, e.g., by having the task list in a group be an array.

    type/enhancement theme/core stage/accepted 
    opened by mfischer-zd 66
  • HTTP UI like consul-ui

    HTTP UI like consul-ui

    Much like consul-ui, it would be nice with a nomad-ui project to visually access and modify jobs etc

    Until Nomad has it's own native UI, jippi/hashi-ui provides a Nomad and Consul UI

    type/enhancement stage/thinking 
    opened by jippi 62
  • Tens of thousands of open file descriptors to a single nomad alloc logs directory

    Tens of thousands of open file descriptors to a single nomad alloc logs directory

    Nomad version

    Nomad v0.4.1

    Operating system and Environment details

    Linux ip-10-201-5-129 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

    Issue

    Nomad has tens of thousands of open file descriptors to an alloc log directory.

    nomad 2143 root *530r DIR 202,80 4096 8454154 /var/lib/ssi/nomad/alloc/14e62a40-8598-2fed-405e-ca237bc940c6/alloc/logs

    Something similar to that repeated ~60000 times. lsof -p 2143 | wc -l returns ~60000

    I stopped the alloc but the descriptors are still there.

    In addition, the nomad process is approaching 55 GB of memory used.

    type/bug theme/api stage/needs-investigation 
    opened by sheldonkwok 57
  • Unable to get nomad config/get template function_denylist option

    Unable to get nomad config/get template function_denylist option

    @notnoop @tgross hi guys! I made an update for 1.2.4 but got another issue with consul templating:

    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:53.234Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon path=/data/nomad/alloc/a7c04d65-2f29-c778-c34c-2513d29f25f4/alloc/logs/.worker-mpi-resolver.stdout.fifo timestamp=2022-01-25T08:44:53.234Z
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad[4342]: client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon path=/data/nomad/alloc/a7c04d65-2f29-c778-c34c-2513d29f25f4/alloc/logs/.worker-mpi-resolver.stdout.fifo timestamp=2022-01-25T08:44:53.234Z
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:53.234Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon path=/data/nomad/alloc/a7c04d65-2f29-c778-c34c-2513d29f25f4/alloc/logs/.worker-mpi-resolver.stderr.fifo timestamp=2022-01-25T08:44:53.234Z
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad[4342]: client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon path=/data/nomad/alloc/a7c04d65-2f29-c778-c34c-2513d29f25f4/alloc/logs/.worker-mpi-resolver.stderr.fifo timestamp=2022-01-25T08:44:53.234Z
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:53.965Z [INFO]  agent: (runner) creating new runner (dry: false, once: false)
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad[4342]: agent: (runner) creating new runner (dry: false, once: false)
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:53.966Z [INFO]  agent: (runner) creating watcher
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:53.966Z [INFO]  agent: (runner) starting
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad[4342]: agent: (runner) creating watcher
    Jan 25 08:44:53 microworker03.te01-shr.nl3 nomad[4342]: agent: (runner) starting
    Jan 25 08:44:54 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:54.307Z [INFO]  client.gc: marking allocation for GC: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4
    Jan 25 08:44:54 microworker03.te01-shr.nl3 nomad[4342]: client.gc: marking allocation for GC: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4
    Jan 25 08:44:58 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:58.309Z [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon timestamp=2022-01-25T08:44:58.309Z
    Jan 25 08:44:58 microworker03.te01-shr.nl3 nomad[4342]: client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon timestamp=2022-01-25T08:44:58.309Z
    Jan 25 08:44:58 microworker03.te01-shr.nl3 nomad: 2022-01-25T08:44:58.309Z [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon timestamp=2022-01-25T08:44:58.309Z
    Jan 25 08:44:58 microworker03.te01-shr.nl3 nomad[4342]: client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=a7c04d65-2f29-c778-c34c-2513d29f25f4 task=worker-mpi-resolver @module=logmon timestamp=2022-01-25T08:44:58.309Z
    

    Nomad side:

    Template failed: /data/nomad/alloc/3a20b272-9965-8c1f-6ab0-c841e303b623/worker-mpi-resolver/local/platformConfig/nl3.tmpl: execute: template: :1:36: executing "" at <plugin "/data/tools/consul.php">: error calling plugin: function is disabled
    --
    
    
    <br class="Apple-interchange-newline" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;">
    

    Originally posted by @bubejur in https://github.com/hashicorp/nomad/issues/11547#issuecomment-1020940729

    type/bug stage/needs-investigation 
    opened by bubejur 56
  • high memory usage in logmon

    high memory usage in logmon

    I have a cluster of 20 nodes, all running "raw-exec" tasks in PHP.

    At random intervals after a while i get a lot of OOM's. I find the server with 100% swap usage and looking like this :

    Screenshot 2021-01-20 at 17 45 21

    If i restart the nomad agent, it all goes back to normal for a while.

    I also get this in nomad log : "2021-01-20T17:52:28.522+0200 [INFO] client.gc: garbage collection skipped because no terminal allocations: reason="number of allocations (89) is over the limit (50)" <--- that message is extremely ambiguous as everything runs normal and nomad was just restarted.

    type/bug stage/waiting-reply theme/logging theme/resource-utilization stage/needs-investigation 
    opened by anastazya 55
  • Succesfully completed batch job is re-run with new allocation.

    Succesfully completed batch job is re-run with new allocation.

    Nomad version

    Nomad v0.8.3 (c85483da3471f4bd3a7c3de112e95f551071769f)

    Operating system and Environment details

    3.10.0-327.36.3.el7.x86_64

    Issue

    A batch job executed, completed successfully and then several hours later, when the allocation was garbage collected was re-run.

    Reproduction steps

    Not sure. Seems to be happening frequently on our cluster though.

    Nomad logs

        2018/05/14 23:30:50.541581 [DEBUG] worker: dequeued evaluation 5e2dfa95-ce49-9e4d-621d-d0e900d6c3aa
        2018/05/14 23:30:50.541765 [DEBUG] sched: <Eval "5e2dfa95-ce49-9e4d-621d-d0e900d6c3aa" JobID: "REDACTED-9d565598-63f2-4c2d-b506-5ae64d4397a2" Namespace: "default">: Total changes: (place 1) (destructive 0) (inplace 0) (stop 0)
        2018/05/14 23:30:50.546101 [DEBUG] worker: submitted plan at index 355103 for evaluation 5e2dfa95-ce49-9e4d-621d-d0e900d6c3aa
        2018/05/14 23:30:50.546140 [DEBUG] sched: <Eval "5e2dfa95-ce49-9e4d-621d-d0e900d6c3aa" JobID: "REDACTED-9d565598-63f2-4c2d-b506-5ae64d4397a2" Namespace: "default">: setting status to complete
        2018/05/14 23:30:50.547618 [DEBUG] worker: updated evaluation <Eval "5e2dfa95-ce49-9e4d-621d-d0e900d6c3aa" JobID: "REDACTED-9d565598-63f2-4c2d-b506-5ae64d4397a2" Namespace: "default">
        2018/05/14 23:30:50.547683 [DEBUG] worker: ack for evaluation 5e2dfa95-ce49-9e4d-621d-d0e900d6c3aa
    
        2018/05/14 23:30:52.074437 [DEBUG] client: starting task runners for alloc '5d0016ac-dd71-6626-f929-6398e80ef28e'
        2018/05/14 23:30:52.074769 [DEBUG] client: starting task context for 'REDACTED-task' (alloc '5d0016ac-dd71-6626-f929-6398e80ef28e')
    2018-05-14T23:30:52.085-0400 [DEBUG] plugin: starting plugin: path=REDACTED/bin/nomad args="[REDACTED/nomad executor {"LogFile":"REDACTED/alloc/5d0016ac-dd71-6626-f929-6398e80ef28e/REDACTED-task/executor.out","LogLevel":"DEBUG"}]"
        2018/05/14 23:34:32.288406 [INFO] client: task "REDACTED-task" for alloc "5d0016ac-dd71-6626-f929-6398e80ef28e" completed successfully
        2018/05/14 23:34:32.288438 [INFO] client: Not restarting task: REDACTED-task for alloc: 5d0016ac-dd71-6626-f929-6398e80ef28e
        2018/05/14 23:34:32.289213 [INFO] client.gc: marking allocation 5d0016ac-dd71-6626-f929-6398e80ef28e for GC
        2018/05/15 01:39:13.888635 [INFO] client.gc: garbage collecting allocation 5d0016ac-dd71-6626-f929-6398e80ef28e due to new allocations and over max (500)
        2018/05/15 01:39:15.389175 [WARN] client: failed to broadcast update to allocation "5d0016ac-dd71-6626-f929-6398e80ef28e"
        2018/05/15 01:39:15.389401 [INFO] client.gc: marking allocation 5d0016ac-dd71-6626-f929-6398e80ef28e for GC
        2018/05/15 01:39:15.390656 [DEBUG] client: terminating runner for alloc '5d0016ac-dd71-6626-f929-6398e80ef28e'
        2018/05/15 01:39:15.390714 [DEBUG] client.gc: garbage collected "5d0016ac-dd71-6626-f929-6398e80ef28e"
        2018/05/15 04:25:10.541590 [INFO] client.gc: garbage collecting allocation 5d0016ac-dd71-6626-f929-6398e80ef28e due to new allocations and over max (500)
        2018/05/15 04:25:10.541626 [DEBUG] client.gc: garbage collected "5d0016ac-dd71-6626-f929-6398e80ef28e"
        2018/05/15 05:46:37.119467 [DEBUG] worker: dequeued evaluation e15e469e-e4f5-2192-207b-84f6a17fd25f
        2018/05/15 05:46:37.139904 [DEBUG] sched: <Eval "e15e469e-e4f5-2192-207b-84f6a17fd25f" JobID: "REDACTED-9d565598-63f2-4c2d-b506-5ae64d4397a2" Namespace: "default">: Total changes: (place 1) (destructive 0) (inplace 0) (stop 0)
        2018/05/15 05:46:37.169051 [INFO] client.gc: marking allocation 5d0016ac-dd71-6626-f929-6398e80ef28e for GC
        2018/05/15 05:46:37.169149 [INFO] client.gc: garbage collecting allocation 5d0016ac-dd71-6626-f929-6398e80ef28e due to forced collection
        2018/05/15 05:46:37.169194 [DEBUG] client.gc: garbage collected "5d0016ac-dd71-6626-f929-6398e80ef28e"
        2018/05/15 05:46:37.177470 [DEBUG] worker: submitted plan at index 373181 for evaluation e15e469e-e4f5-2192-207b-84f6a17fd25f
        2018/05/15 05:46:37.177516 [DEBUG] sched: <Eval "e15e469e-e4f5-2192-207b-84f6a17fd25f" JobID: "REDACTED-9d565598-63f2-4c2d-b506-5ae64d4397a2" Namespace: "default">: setting status to complete
        2018/05/15 05:46:37.179391 [DEBUG] worker: updated evaluation <Eval "e15e469e-e4f5-2192-207b-84f6a17fd25f" JobID: "REDACTED-9d565598-63f2-4c2d-b506-5ae64d4397a2" Namespace: "default">
        2018/05/15 05:46:37.179783 [DEBUG] worker: ack for evaluation e15e469e-e4f5-2192-207b-84f6a17fd25f
        2018/05/15 05:46:40.218701 [DEBUG] client: starting task runners for alloc '928b0562-b7ed-a3c7-d989-89519edadee9'
        2018/05/15 05:46:40.218982 [DEBUG] client: starting task context for 'REDACTED-task' (alloc '928b0562-b7ed-a3c7-d989-89519edadee9')
    2018-05-15T05:46:40.230-0400 [DEBUG] plugin: starting plugin: path=REDACTED/bin/nomad args="[REDACTED/nomad executor {"LogFile":"REDACTED/alloc/928b0562-b7ed-a3c7-d989-89519edadee9/REDACTED-task/executor.out","LogLevel":"DEBUG"}]"
        2018/05/15 11:50:17.836313 [INFO] client: task "REDACTED-task" for alloc "928b0562-b7ed-a3c7-d989-89519edadee9" completed successfully
        2018/05/15 11:50:17.836336 [INFO] client: Not restarting task: REDACTED-task for alloc: 928b0562-b7ed-a3c7-d989-89519edadee9
        2018/05/15 11:50:17.836698 [INFO] client.gc: marking allocation 928b0562-b7ed-a3c7-d989-89519edadee9 for GC
    

    Job file (if appropriate)

    {
        "Job": {
            "AllAtOnce": false,
            "Constraints": [
                {
                    "LTarget": "${node.unique.id}",
                    "Operand": "=",
                    "RTarget": "52c7e5be-a5a0-3a34-1051-5209a91a0197"
                }
            ],
            "CreateIndex": 393646,
            "Datacenters": [
                "dc1"
            ],
            "ID": "REDACTED-9d565598-63f2-4c2d-b506-5ae64d4397a2",
            "JobModifyIndex": 393646,
            "Meta": null,
            "Migrate": null,
            "ModifyIndex": 393673,
            "Name": "REDACTED",
            "Namespace": "default",
            "ParameterizedJob": null,
            "ParentID": "REDACTED/dispatch-1526033570-3cdd72d9",
            "Payload": null,
            "Periodic": null,
            "Priority": 50,
            "Region": "global",
            "Reschedule": null,
            "Stable": false,
            "Status": "dead",
            "StatusDescription": "",
            "Stop": false,
            "SubmitTime": 1526403442162340993,
            "TaskGroups": [
                {
                    "Constraints": [
                        {
                            "LTarget": "${attr.os.signals}",
                            "Operand": "set_contains",
                            "RTarget": "SIGTERM"
                        }
                    ],
                    "Count": 1,
                    "EphemeralDisk": {
                        "Migrate": false,
                        "SizeMB": 300,
                        "Sticky": false
                    },
                    "Meta": null,
                    "Migrate": null,
                    "Name": "REDACTED",
                    "ReschedulePolicy": {
                        "Attempts": 1,
                        "Delay": 5000000000,
                        "DelayFunction": "constant",
                        "Interval": 86400000000000,
                        "MaxDelay": 0,
                        "Unlimited": false
                    },
                    "RestartPolicy": {
                        "Attempts": 1,
                        "Delay": 15000000000,
                        "Interval": 86400000000000,
                        "Mode": "fail"
                    },
                    "Tasks": [
                        {
                            "Artifacts": null,
                            "Config": {
                                "command": "REDACTED",
                                "args": [REDACTED]
                            },
                            "Constraints": null,
                            "DispatchPayload": null,
                            "Driver": "raw_exec",
                            "Env": {REDACTED},
                            "KillSignal": "SIGTERM",
                            "KillTimeout": 5000000000,
                            "Leader": false,
                            "LogConfig": {
                                "MaxFileSizeMB": 10,
                                "MaxFiles": 10
                            },
                            "Meta": null,
                            "Name": "REDACTED",
                            "Resources": {
                                "CPU": 100,
                                "DiskMB": 0,
                                "IOPS": 0,
                                "MemoryMB": 256,
                                "Networks": null
                            },
                            "Services": null,
                            "ShutdownDelay": 0,
                            "Templates": null,
                            "User": "",
                            "Vault": null
                        }
                    ],
                    "Update": null
                }
            ],
            "Type": "batch",
            "Update": {
                "AutoRevert": false,
                "Canary": 0,
                "HealthCheck": "",
                "HealthyDeadline": 0,
                "MaxParallel": 0,
                "MinHealthyTime": 0,
                "Stagger": 0
            },
            "VaultToken": "",
            "Version": 0
        }
    }
    

    What I can tell you for sure is that the allocation ran to completion and exited successfully.

    We're going to try turning off the reschedule and restart policies to see if that has any effect since we're taking care of re-running these on any sort of job failure anyway.

    type/bug stage/needs-investigation 
    opened by nugend 48
  • failed to submit plan for evaluation: ... no such key \\" in keyring error after moving cluster to 1.4.1">

    failed to submit plan for evaluation: ... no such key \"\" in keyring error after moving cluster to 1.4.1

    Nomad version

    Nomad v1.4.1 (2aa7e66bdb526e25f59883952d74dad7ea9a014e)

    Operating system and Environment details

    Ubuntu 22.04, Nomad 1.4.1

    Issue

    After moving the Nomad server and clients to v1.4.1, I noticed that sometimes (unfortunately not always) after cycling Nomad server ASGs and Nomad client ASGs with new AMIs, jobs scheduled on the workers can't be allocated. So to be precise:

    1. Pipeline creates new Nomad AMIs via Packer
    2. Pipeline terraforms Nomad server ASG with server config
    3. Pipeline terraforms client ASG or dedicated instances with updated AMI
    4. Lost jobs on worker (like for instance the Traefik ingress job) can't be allocated

    This literally never happened before 1.4.X

    Client output looks like this:

    nomad eval list

    ID Priority Triggered By Job ID Namespace Node ID Status Placement Failures 427e9905 50 failed-follow-up plugin-aws-ebs-nodes default pending false 35f4fdfb 50 failed-follow-up plugin-aws-efs-nodes default pending false 46152dcd 50 failed-follow-up spot-drainer default pending false 71e3e58a 50 failed-follow-up plugin-aws-ebs-nodes default pending false e86177a6 50 failed-follow-up plugin-aws-efs-nodes default pending false 2289ba5f 50 failed-follow-up spot-drainer default pending false da3fdad6 50 failed-follow-up plugin-aws-ebs-nodes default pending false b445b976 50 failed-follow-up plugin-aws-efs-nodes default pending false 48a6771e 50 failed-follow-up ingress default pending false

    Reproduction steps

    Unclear at this point. I seem to be able to somewhat force the issue, when I cycle the Nomad server ASG with updated AMIs.

    Expected Result

    Client work that was lost, should be rescheduled once the Nomad client comes up and reports readiness.

    Actual Result

    Lost jobs that can't be allocated on worker with an updated AMI.

    nomad status

    ID Type Priority Status Submit Date auth-service service 50 pending 2022-10-09T11:32:57+02:00 ingress service 50 pending 2022-10-17T14:57:26+02:00 plugin-aws-ebs-controller service 50 running 2022-10-09T14:48:11+02:00 plugin-aws-ebs-nodes system 50 running 2022-10-09T14:48:11+02:00 plugin-aws-efs-nodes system 50 running 2022-10-09T11:37:04+02:00 prometheus service 50 pending 2022-10-18T21:19:24+02:00 spot-drainer system 50 running 2022-10-11T18:04:49+02:00

    Job file (if appropriate)

    variable "stage" {
      type        = string
      description = "The stage for this jobfile."
    }
    
    variable "domain_suffix" {
      type        = string
      description = "The HDI stage specific domain suffix."
    }
    
    variable "acme_route" {
      type = string
    }
    
    variables {
      step_cli_version = "0.22.0"
      traefik_version  = "2.9.1"
    }
    
    job "ingress" {
    
      datacenters = [join("-", ["pd0011", var.stage])]
    
      type = "service"
    
      group "ingress" {
    
        constraint {
          attribute = meta.instance_type
          value     = "ingress"
        }
    
        count = 1
    
        service {
          name = "traefik"
          tags = [
            "traefik.enable=true",
    
            "traefik.http.routers.api.rule=Host(`ingress.dsp.${var.domain_suffix}`)",
            "traefik.http.routers.api.entrypoints=secure",
            "[email protected]",
            "traefik.http.routers.api.tls.certresolver=hdi_acme_resolver",
            "[email protected]",
            "[email protected]",
    
            "traefik.http.routers.ping.rule=Host(`ingress.dsp.${var.domain_suffix}`) && Path(`/ping`)",
            "traefik.http.routers.ping.entrypoints=secure",
            "[email protected]",
            "traefik.http.routers.ping.tls.certresolver=hdi_acme_resolver",
            "[email protected]",
            "[email protected]"
          ]
    
          port = "https"
    
          check {
            name     = "Traefik Ping Endpoint"
            type     = "http"
            protocol = "http"
            port     = "http"
            path     = "/ping"
            interval = "10s"
            timeout  = "2s"
          }
        }
    
        network {
    
          port "http" {
            static = 80
            to     = 80
          }
          port "https" {
            static = 443
            to     = 443
          }
        }
    
        ephemeral_disk {
          size    = "300"
          sticky  = true
          migrate = true
        }
    
        task "generate_consul_cert" {
    <snip>
        }
    
        task "generate_nomad_cert" {
    <snip>
        }
    
    
        task "traefik" {
    
          driver = "docker"
    
          env {
            LEGO_CA_CERTIFICATES = join(":", ["${NOMAD_SECRETS_DIR}/cacert.pem", "${NOMAD_SECRETS_DIR}/root_ca_${var.stage}.crt"])
            # LEGO_CA_SYSTEM_CERT_POOL = true
          }
    
          config {
            image = "traefik:${var.traefik_version}"
            volumes = [
              # Use absolute paths to mount arbitrary paths on the host
              "local/:/etc/traefik/",
              "/etc/timezone:/etc/timezone:ro",
              "/etc/localtime:/etc/localtime:ro",
            ]
            network_mode = "host"
            ports        = ["http", "https"]
          }
    
          resources {
            cpu    = 800
            memory = 128
          }
          # Controls the timeout between signalling a task it will be killed
          # and killing the task. If not set a default is used.
          kill_timeout = "60s"
    
          template {
            data        = <<EOH
    <snip>
        }
      }
    }
    
    

    Nomad Server logs (if appropriate)

    Oct 20 15:00:30 uat-nomad-95I nomad[485]:     2022-10-20T15:00:30.571+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
    Oct 20 15:00:42 uat-nomad-95I nomad[485]:     2022-10-20T15:00:42.948+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=827f0dfe-0584-b44a-92e2-9a92ab649c48 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
    

    Nomad Client logs (if appropriate)

    Oct 20 11:55:00 uat-worker-wZz nomad[464]:              Log Level: INFO
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:                 Region: europe (DC: pd0011-uat)
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:                 Server: false
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:                Version: 1.4.1
    Oct 20 11:55:00 uat-worker-wZz nomad[464]: ==> Nomad agent started! Log data will stream in below:
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.817+0200 [INFO]  client: using state directory: state_dir=/opt/hsy/nomad/data/client
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.826+0200 [INFO]  client: using alloc directory: alloc_dir=/opt/hsy/nomad/data/alloc
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.826+0200 [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.831+0200 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.852+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens5
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.856+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.870+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens5
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.897+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.900+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.900+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.906+0200 [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get \"https://127.0.0.1:8501/v1/catalog/datacenters\": dial tcp 127.0.0.1:8501: connect: connection refused"
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.437+0200 [INFO]  client: started client: node_id=5f21ebef-e0a9-8bd2-775a-61b3e32cac6e
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.437+0200 [WARN]  agent: not registering Nomad HTTPS Health Check because verify_https_client enabled
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.438+0200 [WARN]  client.server_mgr: no servers available
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.439+0200 [WARN]  client.server_mgr: no servers available
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.453+0200 [INFO]  client.consul: discovered following servers: servers=[10.194.73.146:4647, 10.194.74.253:4647, 10.194.75.103:4647]
    Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.501+0200 [INFO]  client: node registration complete
    Oct 20 11:55:06 uat-worker-wZz nomad[464]:     2022-10-20T11:55:06.856+0200 [INFO]  client: node registration complete
    Oct 20 11:55:14 uat-worker-wZz nomad[464]:     2022-10-20T11:55:14.893+0200 [INFO]  client.fingerprint_mgr.consul: consul agent is available
    Oct 20 11:55:21 uat-worker-wZz nomad[464]:     2022-10-20T11:55:21.417+0200 [INFO]  client: node registration complete
    
    type/bug theme/keyring 
    opened by bfqrst 47
  • [ui] Recompute Y Axis on data change

    [ui] Recompute Y Axis on data change

    Resolves #15098


    Uses a did-update triggered on yScale() to manually re-call the yAxis scale formatter. Similar change to what we did in https://github.com/hashicorp/nomad/pull/14814.

    image

    opened by philrenaud 2
  • Create variables in nonexistent namespace

    Create variables in nonexistent namespace

    Nomad version

    v1.4.3

    Operating system and Environment details

    n/a

    Issue

    I can create variables in non-existing namespaces using the CLI. This however does not seem possible from the web interface.

    Reproduction steps

    nomad var put -namespace=test123 test test=abc
    
    

    Expected Result

    namespace does not exist
    

    Actual Result

    Created variable "test" with modify index 116869

    type/bug stage/accepted theme/variables 
    opened by SamMousa 1
  • acl: replicate auth-methods from federated cluster leaders.

    acl: replicate auth-methods from federated cluster leaders.

    I rejigged some on the leader code so it was easier and made more sense when scrolling.

    The hash has been added to the stub object so this can be used when performing replication diffs as is done with other ACL objects. It also now includes the create and modify indexes.

    The auth method struct object has JSON marhsal/unmarshal implementations to allow for the time.Duration parameter.

    Related https://github.com/hashicorp/nomad/issues/13120

    opened by jrasell 0
Releases(v1.4.3)
  • v1.4.3(Nov 22, 2022)

    1.4.3 (November 21, 2022)

    IMPROVEMENTS:

    • api: Added an API for counting evaluations that match a filter [GH-15147]
    • cli: Improved performance of eval delete with large filter sets [GH-15117]
    • consul: add trace logging around service registrations [GH-6115]
    • deps: Updated github.com/aws/aws-sdk-go from 1.44.84 to 1.44.126 [GH-15081]
    • deps: Updated github.com/docker/cli from 20.10.18+incompatible to 20.10.21+incompatible [GH-15078]
    • exec: Allow running commands from mounted host volumes [GH-14851]
    • scheduler: when multiple evaluations are pending for the same job, evaluate the latest and cancel the intermediaries on success [GH-14621]
    • server: Add a git revision tag to the serf tags gossiped between servers. [GH-9159]
    • template: Expose per-template configuration for error_on_missing_key. This allows jobspec authors to specify that a template should fail if it references a struct or map key that does not exist. The default value is false and should be fully backward compatible. [GH-14002]
    • ui: Adds a "Pack" tag and logo on the jobs list index when appropriate [GH-14833]
    • ui: add consul connect service upstream and on-update info to the service sidebar [GH-15324]
    • ui: allow users to upload files by click or drag in the web ui [GH-14747]

    BUG FIXES:

    • api: Ensure all request body decode errors return a 400 status code [GH-15252]
    • autopilot: Fixed a bug where autopilot would try to fetch raft stats from other regions [GH-15290]
    • cleanup: fixed missing timer.Reset for plan queue stat emitter [GH-15134]
    • client: Fixed a bug where tasks would restart without waiting for interval [GH-15215]
    • client: fixed a bug where non-docker tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running [GH-15214]
    • client: prevent allocations from failing on client reconnect by retrying RPC requests when no servers are available yet [GH-15140]
    • csi: Fixed race condition that can cause a panic when volume is garbage collected [GH-15101]
    • device: Fixed a bug where device plugins would not fingerprint on startup [GH-15125]
    • drivers: Fixed a bug where one goroutine was leaked per task [GH-15180]
    • drivers: pass missing propagation_mode configuration for volume mounts to external plugins [GH-15096]
    • event_stream: fixed a bug where dynamic port values would fail to serialize in the event stream [GH-12916]
    • fingerprint: Ensure Nomad can correctly fingerprint Consul gRPC where the Consul agent is running v1.14.0 or greater [GH-15309]
    • keyring: Fixed a bug where a missing key would prevent any further replication. [GH-15092]
    • keyring: Fixed a bug where replication would stop after snapshot restores [GH-15227]
    • keyring: Re-enabled keyring garbage collection after fixing a bug where keys would be garbage collected even if they were used to sign a live allocation's workload identity. [GH-15092]
    • scheduler: Fixed a bug that prevented disconnected allocations to be updated after they reconnect. [GH-15068]
    • scheduler: Prevent unnecessary placements when disconnected allocations reconnect. [GH-15068]
    • template: Fixed a bug where template could cause agent panic on startup [GH-15192]
    • ui: Fixed a bug where the task log sidebar would close and re-open if the parent job state changed [GH-15146]
    • variables: Fixed a bug where a long-running rekey could hit the nack timeout [GH-15102]
    • wi: Fixed a bug where clients running pre-1.4.0 allocations would erase the token used to query service registrations after upgrade [GH-15121]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.8(Nov 22, 2022)

    1.3.8 (November 21, 2022)

    BUG FIXES:

    • api: Ensure all request body decode errors return a 400 status code [GH-15252]
    • cleanup: fixed missing timer.Reset for plan queue stat emitter [GH-15134]
    • client: Fixed a bug where tasks would restart without waiting for interval [GH-15215]
    • client: fixed a bug where non-docker tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running [GH-15214]
    • client: prevent allocations from failing on client reconnect by retrying RPC requests when no servers are available yet [GH-15140]
    • csi: Fixed race condition that can cause a panic when volume is garbage collected [GH-15101]
    • device: Fixed a bug where device plugins would not fingerprint on startup [GH-15125]
    • drivers: Fixed a bug where one goroutine was leaked per task [GH-15180]
    • drivers: pass missing propagation_mode configuration for volume mounts to external plugins [GH-15096]
    • event_stream: fixed a bug where dynamic port values would fail to serialize in the event stream [GH-12916]
    • fingerprint: Ensure Nomad can correctly fingerprint Consul gRPC where the Consul agent is running v1.14.0 or greater [GH-15309]
    • scheduler: Fixed a bug that prevented disconnected allocations to be updated after they reconnect. [GH-15068]
    • scheduler: Prevent unnecessary placements when disconnected allocations reconnect. [GH-15068]
    • template: Fixed a bug where template could cause agent panic on startup [GH-15192]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.15(Nov 22, 2022)

    1.2.15 (November 21, 2022)

    BUG FIXES:

    • api: Ensure all request body decode errors return a 400 status code [GH-15252]
    • cleanup: fixed missing timer.Reset for plan queue stat emitter [GH-15134]
    • client: Fixed a bug where tasks would restart without waiting for interval [GH-15215]
    • client: fixed a bug where non-docker tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running [GH-15214]
    • csi: Fixed race condition that can cause a panic when volume is garbage collected [GH-15101]
    • device: Fixed a bug where device plugins would not fingerprint on startup [GH-15125]
    • drivers: Fixed a bug where one goroutine was leaked per task [GH-15180]
    • drivers: pass missing propagation_mode configuration for volume mounts to external plugins [GH-15096]
    • event_stream: fixed a bug where dynamic port values would fail to serialize in the event stream [GH-12916]
    • fingerprint: Ensure Nomad can correctly fingerprint Consul gRPC where the Consul agent is running v1.14.0 or greater [GH-15309]
    Source code(tar.gz)
    Source code(zip)
  • v1.4.2(Oct 27, 2022)

    1.4.2 (October 26, 2022)

    SECURITY:

    • event stream: Fixed a bug where ACL token expiration was not checked when emitting events [GH-15013]

    IMPROVEMENTS:

    • cli: Added -id-prefix-template option to nomad job dispatch [GH-14631]
    • cli: add nomad fmt to the CLI [GH-14779]
    • deps: update go-memdb for goroutine leak fix [GH-14983]
    • docker: improve memory usage for docker_logger [GH-14875]
    • event stream: Added ACL role topic with create and delete types [GH-14923]
    • scheduler: Allow jobs not requiring network resources even when no network is fingerprinted [GH-14300]
    • ui: adds searching and filtering to the topology page [GH-14913]

    BUG FIXES:

    • acl: Callers should be able to read policies linked via roles to the token used [GH-14982]
    • acl: Ensure all federated servers meet v.1.4.0 minimum before ACL roles can be written [GH-14908]
    • acl: Fixed a bug where Nomad version checking for one-time tokens was enforced across regions [GH-14912]
    • cli: prevent a panic when the Nomad API returns an error while collecting a debug bundle [GH-14992]
    • client: Check ACL token expiry when resolving token within ACL cache [GH-14922]
    • client: Fixed a bug where Nomad could not detect cores on recent RHEL systems [GH-15027]
    • client: Fixed a bug where network fingerprinters were not reloaded when the client configuration was reloaded with SIGHUP [GH-14615]
    • client: Resolve ACL roles within client ACL cache [GH-14922]
    • consul: Fixed a bug where services continuously re-registered [GH-14917]
    • consul: atomically register checks on initial service registration [GH-14944]
    • deps: Update hashicorp/consul-template to 90370e07bf621811826b803fb633dadbfb4cf287; fixes template rerendering issues when only user or group set [GH-15045]
    • deps: Update hashicorp/raft to v1.3.11; fixes unstable leadership on server removal [GH-15021]
    • event stream: Check ACL token expiry when resolving tokens [GH-14923]
    • event stream: Resolve ACL roles within ACL tokens [GH-14923]
    • keyring: Fixed a bug where nomad system gc forced a root keyring rotation. [GH-15009]
    • keyring: Fixed a bug where if a key is rotated immediately following a leader election, plans that are in-flight may get signed before the new leader has the key. Allow for a short timeout-and-retry to avoid rejecting plans. [GH-14987]
    • keyring: Fixed a bug where keyring initialization is blocked by un-upgraded federated regions [GH-14901]
    • keyring: Fixed a bug where root keyring garbage collection configuration values were not respected. [GH-15009]
    • keyring: Fixed a bug where root keyring initialization could occur before the raft FSM on the leader was verified to be up-to-date. [GH-14987]
    • keyring: Fixed a bug where root keyring replication could make incorrectly stale queries and exit early if those queries did not return the expected key. [GH-14987]
    • keyring: Fixed a bug where the root keyring replicator's rate limiting would be skipped if the keyring replication exceeded the burst rate. [GH-14987]
    • keyring: Removed root key garbage collection to avoid orphaned workload identities [GH-15034]
    • nomad native service discovery: Ensure all local servers meet v.1.3.0 minimum before service registrations can be written [GH-14924]
    • scheduler: Fixed a bug where version checking for disconnected clients handling was enforced across regions [GH-14912]
    • servicedisco: Fixed a bug where job using checks could land on incompatible client [GH-14868]
    • services: Fixed a regression where check task validation stopped allowing some configurations [GH-14864]
    • ui: Fixed line charts to update x-axis (time) where relevant [GH-14814]
    • ui: Fixes an issue where service tags would bleed past the edge of the screen [GH-14832]
    • variables: Fixed a bug where Nomad version checking was not enforced for writing to variables [GH-14912]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.7(Oct 27, 2022)

    1.3.7 (October 26, 2022)

    IMPROVEMENTS:

    • deps: update go-memdb for goroutine leak fix [GH-14983]
    • docker: improve memory usage for docker_logger [GH-14875]

    BUG FIXES:

    • acl: Fixed a bug where Nomad version checking for one-time tokens was enforced across regions [GH-14911]
    • client: Fixed a bug where Nomad could not detect cores on recent RHEL systems [GH-15027]
    • consul: Fixed a bug where services continuously re-registered [GH-14917]
    • consul: atomically register checks on initial service registration [GH-14944]
    • deps: Update hashicorp/raft to v1.3.11; fixes unstable leadership on server removal [GH-15021]
    • nomad native service discovery: Ensure all local servers meet v.1.3.0 minimum before service registrations can be written [GH-14924]
    • scheduler: Fixed a bug where version checking for disconnected clients handling was enforced across regions [GH-14911]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.14(Oct 27, 2022)

    1.2.14 (October 26, 2022)

    IMPROVEMENTS:

    • deps: update go-memdb for goroutine leak fix [GH-14983]

    BUG FIXES:

    • acl: Fixed a bug where Nomad version checking for one-time tokens was enforced across regions [GH-14910]
    • deps: Update hashicorp/raft to v1.3.11; fixes unstable leadership on server removal [GH-15021]
    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(Oct 6, 2022)

  • v1.4.0(Oct 6, 2022)

    1.4.0 (October 04, 2022)

    FEATURES:

    • ACL Roles: Added support for ACL Roles. [GH-14320]
    • Nomad Native Service Discovery: Add built-in support for checks on Nomad services [GH-13715]
    • Variables: Added support for storing encrypted configuration values. [GH-13000]
    • UI Services table: Display task-level services in addition to group-level services. [GH-14199]

    BREAKING CHANGES:

    • audit (Enterprise): fixed inconsistency in event filter logic [GH-14212]
    • cli: eval status -json no longer supports listing all evals in JSON. Use eval list -json. [GH-14651]
    • core: remove support for raft protocol version 2 [GH-13467]

    SECURITY:

    • client: recover from panics caused by artifact download to prevent the Nomad client from crashing [GH-14696]

    IMPROVEMENTS:

    • acl: ACL tokens can now be created with an expiration TTL. [GH-14320]
    • api: return a more descriptive error when /v1/acl/bootstrap fails to decode request body [GH-14629]
    • autopilot: upgrade to raft-autopilot library [GH-14441]
    • cli: Removed deprecated network quota fields from quota status output [GH-14468]
    • cli: acl policy info output format has changed to improve readability with large policy documents [GH-14140]
    • cli: operator debug now writes newline-delimited JSON files for large collections [GH-14610]
    • cli: ignore -hcl2-strict when -hcl1 is set. [GH-14426]
    • cli: warn destructive update only when count is greater than 1 [GH-13103]
    • client: Add built-in support for checks on nomad services [GH-13715]
    • client: re-enable nss-based user lookups [GH-14742]
    • connect: add namespace, job, and group to Envoy stats [GH-14311]
    • connect: add nomad environment variables to envoy bootstrap [GH-12959]
    • consul: Allow interpolation of task environment values into Consul Service Mesh configuration [GH-14445]
    • consul: Enable setting custom tagged_addresses field [GH-12951]
    • core: constraint operands are now compared numerically if operands are numbers [GH-14722]
    • deps: Update fsouza/go-dockerclient to v1.8.2 [GH-14112]
    • deps: Update go.etcd.io/bbolt to v1.3.6 [GH-14025]
    • deps: Update google.golang.org/grpc to v1.48.0 [GH-14103]
    • deps: Update gopsutil for improvements in fingerprinting on non-Linux platforms [GH-14209]
    • deps: Updated github.com/armon/go-metrics to v0.4.1 which includes a performance improvement for Prometheus sink [GH-14493]
    • deps: Updated github.com/hashicorp/go-version to v1.6.0 [GH-14364]
    • deps: remove unused darwin C library [GH-13894]
    • fingerprint: Add node attribute for number of reservable cores: cpu.num_reservable_cores [GH-14694]
    • fingerprint: Consul and Vault attributes are no longer cleared on fingerprinting failure [GH-14673]
    • jobspec: Added strlen HCL2 function to determine the length of a string [GH-14463]
    • server: Log when a node's eligibility changes [GH-14125]
    • ui: Display different message when trying to exec into a job with no task running. [GH-14071]
    • ui: add service discovery, along with health checks, to job and allocation routes [GH-14408]
    • ui: adds a sidebar to show in-page logs for a given task, accessible via job, client, or task group routes [GH-14612]
    • ui: allow deep-dive clicks to tasks from client, job, and task group routes. [GH-14592]
    • ui: attach timestamps and a visual indicator on failure to health checks in the Web UI [GH-14677]

    BUG FIXES:

    • api: Fixed a bug where the List Volume API did not include the ControllerRequired and ResourceExhausted fields. [GH-14484]
    • cli: Ignore Vault token when generating job diff. [GH-14424]
    • cli: fixed a bug in the operator api command where the HTTPS scheme was not always correctly calculated [GH-14635]
    • cli: return exit code 255 when nomad job plan fails job validation. [GH-14426]
    • cli: set content length on POST requests when using the nomad operator api command [GH-14634]
    • client: Fixed bug where clients could attempt to connect to servers with invalid addresses retrieved from Consul. [GH-14431]
    • core: prevent new allocations from overlapping execution with stopping allocations [GH-10446]
    • csi: Fixed a bug where a volume that was successfully unmounted by the client but then failed controller unpublishing would not be marked free until garbage collection ran. [GH-14675]
    • csi: Fixed a bug where the server would not send controller unpublish for a failed allocation. [GH-14484]
    • csi: Fixed a data race in the volume unpublish endpoint that could result in claims being incorrectly marked as freed before being persisted to raft. [GH-14484]
    • helpers: Fixed a bug where random stagger func did not protect against negative inputs [GH-14497]
    • jobspec: Fixed a bug where an artifact with headers configuration would fail to parse when using HCLv1 [GH-14637]
    • metrics: Update client node_scheduling_eligibility value with server heartbeats. [GH-14483]
    • quotas (Enterprise): Fixed a server crashing panic when updating and checking a quota concurrently.
    • rpc (Enterprise): check for spec changes in all regions when registering multiregion jobs [GH-14519]
    • scheduler (Enterprise): Fixed bug where the scheduler would treat multiregion jobs as paused for job types that don't use deployments [GH-14659]
    • template: Fixed a bug where the splay timeout was not being applied when change_mode was set to script. [GH-14749]
    • ui: Remove extra space when displaying the version in the menu footer. [GH-14457]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.6(Oct 6, 2022)

    1.3.6 (October 04, 2022)

    SECURITY:

    • client: recover from panics caused by artifact download to prevent the Nomad client from crashing [GH-14696]

    IMPROVEMENTS:

    • api: return a more descriptive error when /v1/acl/bootstrap fails to decode request body [GH-14629]
    • cli: ignore -hcl2-strict when -hcl1 is set. [GH-14426]
    • cli: warn destructive update only when count is greater than 1 [GH-13103]
    • consul: Allow interpolation of task environment values into Consul Service Mesh configuration [GH-14445]
    • ui: Display different message when trying to exec into a job with no task running. [GH-14071]

    BUG FIXES:

    • api: Fixed a bug where the List Volume API did not include the ControllerRequired and ResourceExhausted fields. [GH-14484]
    • cli: Ignore Vault token when generating job diff. [GH-14424]
    • cli: fixed a bug in the operator api command where the HTTPS scheme was not always correctly calculated [GH-14635]
    • cli: return exit code 255 when nomad job plan fails job validation. [GH-14426]
    • cli: set content length on POST requests when using the nomad operator api command [GH-14634]
    • client: Fixed bug where clients could attempt to connect to servers with invalid addresses retrieved from Consul. [GH-14431]
    • csi: Fixed a bug where a volume that was successfully unmounted by the client but then failed controller unpublishing would not be marked free until garbage collection ran. [GH-14675]
    • csi: Fixed a bug where the server would not send controller unpublish for a failed allocation. [GH-14484]
    • csi: Fixed a data race in the volume unpublish endpoint that could result in claims being incorrectly marked as freed before being persisted to raft. [GH-14484]
    • helpers: Fixed a bug where random stagger func did not protect against negative inputs [GH-14497]
    • jobspec: Fixed a bug where an artifact with headers configuration would fail to parse when using HCLv1 [GH-14637]
    • metrics: Update client node_scheduling_eligibility value with server heartbeats. [GH-14483]
    • quotas (Enterprise): Fixed a server crashing panic when updating and checking a quota concurrently.
    • rpc: check for spec changes in all regions when registering multiregion jobs [GH-14519]
    • scheduler: Fixed bug where the scheduler would treat multiregion jobs as paused for job types that don't use deployments [GH-14659]
    • template: Fixed a bug where the splay timeout was not being applied when change_mode was set to script. [GH-14749]
    • ui: Remove extra space when displaying the version in the menu footer. [GH-14457]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.13(Oct 6, 2022)

    1.2.13 (October 04, 2022)

    SECURITY:

    • client: recover from panics caused by artifact download to prevent the Nomad client from crashing [GH-14696]

    BUG FIXES:

    • api: Fixed a bug where the List Volume API did not include the ControllerRequired and ResourceExhausted fields. [GH-14484]
    • client: Fixed bug where clients could attempt to connect to servers with invalid addresses retrieved from Consul. [GH-14431]
    • csi: Fixed a bug where a volume that was successfully unmounted by the client but then failed controller unpublishing would not be marked free until garbage collection ran. [GH-14675]
    • csi: Fixed a bug where the server would not send controller unpublish for a failed allocation. [GH-14484]
    • csi: Fixed a bug where volume claims on lost or garbage collected nodes could not be freed [GH-14720]
    • csi: Fixed a data race in the volume unpublish endpoint that could result in claims being incorrectly marked as freed before being persisted to raft. [GH-14484]
    • jobspec: Fixed a bug where an artifact with headers configuration would fail to parse when using HCLv1 [GH-14637]
    • metrics: Update client node_scheduling_eligibility value with server heartbeats. [GH-14483]
    • quotas (Enterprise): Fixed a server crashing panic when updating and checking a quota concurrently.
    • rpc: check for spec changes in all regions when registering multiregion jobs [GH-14519]
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0-rc.1(Sep 27, 2022)

    1.4.0 (Unreleased)

    FEATURES:

    • ACL Roles: Added support for ACL Roles. [GH-14320]
    • Nomad Native Service Discovery: Add built-in support for checks on Nomad services [GH-13715]
    • Variables: Added support for storing encrypted configuration values. [GH-13000]
    • UI Services table: Display task-level services in addition to group-level services. [GH-14199]

    BREAKING CHANGES:

    • audit (Enterprise): fixed inconsistency in event filter logic [GH-14212]
    • core: remove support for raft protocol version 2 [GH-13467]
    • cli: eval status -json no longer supports listing all evals in JSON. Use eval list -json. [GH-14651]

    SECURITY:

    • client: recover from panics caused by artifact download to prevent the Nomad client from crashing [GH-14696]

    IMPROVEMENTS:

    • acl: ACL tokens can now be created with an expiration TTL. [GH-14320]
    • api: return a more descriptive error when /v1/acl/bootstrap fails to decode request body [GH-14629]
    • autopilot: upgrade to raft-autopilot library [GH-14441]
    • build: Update go toolchain to 1.19 [GH-14132]
    • cli: Removed deprecated network quota fields from quota status output [GH-14468]
    • cli: acl policy info output format has changed to improve readability with large policy documents [GH-14140]
    • cli: operator debug now writes newline-delimited JSON files for large collections [GH-14610]
    • cli: ignore -hcl2-strict when -hcl1 is set. [GH-14426]
    • cli: warn destructive update only when count is greater than 1 [GH-13103]
    • client: Add built-in support for checks on nomad services [GH-13715]
    • connect: add namespace, job, and group to Envoy stats [GH-14311]
    • connect: add nomad environment variables to envoy bootstrap [GH-12959]
    • consul: Allow interpolation of task environment values into Consul Service Mesh configuration [GH-14445]
    • consul: Enable setting custom tagged_addresses field [GH-12951]
    • core: constraint operands are now compared numerically if operands are numbers [GH-14722]
    • deps: Update fsouza/go-dockerclient to v1.8.2 [GH-14112]
    • deps: Update go.etcd.io/bbolt to v1.3.6 [GH-14025]
    • deps: Update google.golang.org/grpc to v1.48.0 [GH-14103]
    • deps: Update gopsutil for improvements in fingerprinting on non-Linux platforms [GH-14209]
    • deps: Updated github.com/armon/go-metrics to v0.4.1 which includes a performance improvement for Prometheus sink [GH-14493]
    • deps: Updated github.com/hashicorp/go-version to v1.6.0 [GH-14364]
    • deps: remove unused darwin C library [GH-13894]
    • fingerprint: Add node attribute for number of reservable cores: cpu.num_reservable_cores [GH-14694]
    • fingerprint: Consul and Vault attributes are no longer cleared on fingerprinting failure [GH-14673]
    • jobspec: Added strlen HCL2 function to determine the length of a string [GH-14463]
    • server: Log when a node's eligibility changes [GH-14125]
    • ui: Display different message when trying to exec into a job with no task running. [GH-14071]
    • ui: add service discovery, along with health checks, to job and allocation routes [GH-14408]
    • ui: adds a sidebar to show in-page logs for a given task, accessible via job, client, or task group routes [GH-14612]
    • ui: allow deep-dive clicks to tasks from client, job, and task group routes. [GH-14592]
    • ui: attach timestamps and a visual indicator on failure to health checks in the Web UI [GH-14677]

    BUG FIXES:

    • api: Fixed a bug where the List Volume API did not include the ControllerRequired and ResourceExhausted fields. [GH-14484]
    • cli: Ignore Vault token when generating job diff. [GH-14424]
    • cli: fixed a bug in the operator api command where the HTTPS scheme was not always correctly calculated [GH-14635]
    • cli: return exit code 255 when nomad job plan fails job validation. [GH-14426]
    • cli: set content length on POST requests when using the nomad operator api command [GH-14634]
    • client: Fixed bug where clients could attempt to connect to servers with invalid addresses retrieved from Consul. [GH-14431]
    • core: prevent new allocations from overlapping execution with stopping allocations [GH-10446]
    • csi: Fixed a bug where a volume that was successfully unmounted by the client but then failed controller unpublishing would not be marked free until garbage collection ran. [GH-14675]
    • csi: Fixed a bug where the server would not send controller unpublish for a failed allocation. [GH-14484]
    • csi: Fixed a data race in the volume unpublish endpoint that could result in claims being incorrectly marked as freed before being persisted to raft. [GH-14484]
    • helpers: Fixed a bug where random stagger func did not protect against negative inputs [GH-14497]
    • jobspec: Fixed a bug where an artifact with headers configuration would fail to parse when using HCLv1 [GH-14637]
    • metrics: Update client node_scheduling_eligibility value with server heartbeats. [GH-14483]
    • quotas (Enterprise): Fixed a server crashing panic when updating and checking a quota concurrently.
    • rpc (Enterprise): check for spec changes in all regions when registering multiregion jobs [GH-14519]
    • scheduler (Enterprise): Fixed bug where the scheduler would treat multiregion jobs as paused for job types that don't use deployments [GH-14659]
    • ui: Remove extra space when displaying the version in the menu footer. [GH-14457]
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0-beta.1(Sep 15, 2022)

    1.4.0 (Unreleased)

    FEATURES:

    • ACL Roles: Added support for ACL Roles. [GH-14320]
    • Nomad Native Service Discovery: Add built-in support for checks on Nomad services [GH-13715]
    • Variables: Added support for storing encrypted configuration values. [GH-13000]
    • UI Services table: Display task-level services in addition to group-level services. [GH-14199]

    BREAKING CHANGES:

    • audit (Enterprise): fixed inconsistency in event filter logic [GH-14212]
    • core: remove support for raft protocol version 2 [GH-13467]

    IMPROVEMENTS:

    • acl: ACL tokens can now be created with an expiration TTL. [GH-14320]
    • autopilot: upgrade to raft-autopilot library [GH-14441]
    • build: Update go toolchain to 1.19 [GH-14132]
    • cli: Removed deprecated network quota fields from quota status output [GH-14468]
    • cli: acl policy info output format has changed to improve readability with large policy documents [GH-14140]
    • cli: ignore -hcl2-strict when -hcl1 is set. [GH-14426]
    • cli: warn destructive update only when count is greater than 1 [GH-13103]
    • client: Add built-in support for checks on nomad services [GH-13715]
    • consul: Allow interpolation of task environment values into Consul Service Mesh configuration [GH-14445]
    • consul: Enable setting custom tagged_addresses field [GH-12951]
    • deps: Update fsouza/go-dockerclient to v1.8.2 [GH-14112]
    • deps: Update go.etcd.io/bbolt to v1.3.6 [GH-14025]
    • deps: Update google.golang.org/grpc to v1.48.0 [GH-14103]
    • deps: Update gopsutil for improvements in fingerprinting on non-Linux platforms [GH-14209]
    • deps: Updated github.com/armon/go-metrics to v0.4.1 which includes a performance improvement for Prometheus sink [GH-14493]
    • deps: Updated github.com/hashicorp/go-version to v1.6.0 [GH-14364]
    • deps: remove unused darwin C library [GH-13894]
    • jobspec: Added strlen HCL2 function to determine the length of a string [GH-14463]
    • server: Log when a node's eligibility changes [GH-14125]
    • ui: Display different message when trying to exec into a job with no task running. [GH-14071]
    • ui: add service discovery, along with health checks, to job and allocation routes [GH-14408]
    • ui: added visual regression tests for top-level UI routes [GH-12872]

    BUG FIXES:

    • api: Fixed a bug where the List Volume API did not include the ControllerRequired and ResourceExhausted fields. [GH-14484]
    • cli: Ignore Vault token when generating job diff. [GH-14424]
    • cli: return exit code 255 when nomad job plan fails job validation. [GH-14426]
    • client: Fixed bug where clients could attempt to connect to servers with invalid addresses retrieved from Consul. [GH-14431]
    • core: prevent new allocations from overlapping execution with stopping allocations [GH-10446]
    • csi: Fixed a bug where the server would not send controller unpublish for a failed allocation. [GH-14484]
    • csi: Fixed a data race in the volume unpublish endpoint that could result in claims being incorrectly marked as freed before being persisted to raft. [GH-14484]
    • helpers: Fixed a bug where random stagger func did not protect against negative inputs [GH-14497]
    • metrics: Update client node_scheduling_eligibility value with server heartbeats. [GH-14483]
    • quotas (Enterprise): Fixed a server crashing panic when updating and checking a quota concurrently.
    • rpc: check for spec changes in all regions when registering multiregion jobs [GH-14519]
    • ui: Remove extra space when displaying the version in the menu footer. [GH-14457]
    • ui: Stabilizes visual regression tests [GH-14551]
    • ui: when creating a secure variable, check against your namespaces rather than assuming default [GH-13991]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.5(Aug 31, 2022)

    1.3.5 (August 31, 2022)

    IMPROVEMENTS:

    • cgroups: use cgroup.kill interface file when using cgroups v2 [GH-14371]
    • consul: Reduce load on Consul leader server by allowing stale results when listing namespaces. [GH-12953]

    BUG FIXES:

    • cli: Fixed a bug where forcing a periodic job would fail if the job ID prefix-matched other periodic jobs [GH-14333]
    • template: Fixed a bug that could cause Nomad to panic when using change_mode = "script" [GH-14374]
    • ui: Revert a change that resulted in UI errors when ACLs were not used. [GH-14381]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.12(Aug 31, 2022)

    1.2.12 (August 31, 2022)

    IMPROVEMENTS:

    • consul: Reduce load on Consul leader server by allowing stale results when listing namespaces. [GH-12953]

    BUG FIXES:

    • cli: Fixed a bug where forcing a periodic job would fail if the job ID prefix-matched other periodic jobs [GH-14333]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.18(Aug 31, 2022)

  • v1.3.4(Aug 25, 2022)

    1.3.4 (August 25, 2022)

    IMPROVEMENTS:

    • api: HTTP server now returns a 429 error code when hitting the connection limit [GH-13621]
    • build: update to go1.19 [GH-14132]
    • cli: operator debug now outputs current leader to debug bundle [GH-13472]
    • cli: operator snapshot state supports -filter expressions and avoids writing large temporary files [GH-13658]
    • client: add option to restart all tasks of an allocation, regardless of lifecycle type or state. [GH-14127]
    • client: only start poststop tasks after poststart tasks are done. [GH-14127]
    • deps: Updated github.com/hashicorp/go-discover to latest to allow setting the AWS endpoint definition [GH-13491]
    • driver/docker: Added config option to disable container healthcheck [GH-14089]
    • qemu: Added option to configure drive_interface [GH-11864]
    • sentinel: add the ability to reference the namespace and Nomad acl token in policies [GH-14171]
    • template: add script change_mode that allows scripts to be executed on template change [GH-13972]
    • ui: Add button to restart all tasks in an allocation. [GH-14223]
    • ui: add general keyboard navigation to the Nomad UI [GH-14138]

    BUG FIXES:

    • api: cleanup whitespace from failed api response body [GH-14145]
    • cli: Fixed a bug where job validation requeset was not sent to leader [GH-14065]
    • cli: Fixed a bug where the memory usage reported by Allocation Resource Utilization is zero on systems using cgroups v2 [GH-14069]
    • cli: Fixed a bug where vault token not respected in plan command [GH-14088]
    • client/logmon: fixed a bug where logmon cannot find nomad executable [GH-14297]
    • client: Fixed a bug where cpuset initialization would not work on first agent startup [GH-14230]
    • client: Fixed a bug where user lookups would hang or panic [GH-14248]
    • client: Fixed a problem calculating a services namespace [GH-13493]
    • csi: Fixed a bug where volume claims on lost or garbage collected nodes could not be freed [GH-13301]
    • template: Fixed a bug where job templates would use uid and gid 0 after upgrading to Nomad 1.3.3, causing tasks to fail with the error failed looking up user: managing file ownership is not supported on Windows. [GH-14203]
    • ui: Fixed a bug that caused the allocation details page to display the stats bar chart even if the task was pending. [GH-14224]
    • ui: Removes duplicate breadcrumb header when navigating from child job back to parent. [GH-14115]
    • vault: Fixed a bug where Vault clients were recreated when the server configuration was reloaded, even if there were no changes to the Vault configuration. [GH-14298]
    • vault: Fixed a bug where changing the Vault configuration namespace field was not detected as a change during server configuration reload. [GH-14298]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.11(Aug 25, 2022)

    1.2.11 (August 25, 2022)

    IMPROVEMENTS:

    BUG FIXES:

    • api: cleanup whitespace from failed api response body [GH-14145]
    • client/logmon: fixed a bug where logmon cannot find nomad executable [GH-14297]
    • client: Fixed a bug where user lookups would hang or panic [GH-14248]
    • ui: Fixed a bug that caused the allocation details page to display the stats bar chart even if the task was pending. [GH-14224]
    • vault: Fixed a bug where Vault clients were recreated when the server configuration was reloaded, even if there were no changes to the Vault configuration. [GH-14298]
    • vault: Fixed a bug where changing the Vault configuration namespace field was not detected as a change during server configuration reload. [GH-14298]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.17(Aug 25, 2022)

    1.1.17 (August 25, 2022)

    BUG FIXES:

    • client/logmon: fixed a bug where logmon cannot find nomad executable [GH-14297]
    • ui: Fixed a bug that caused the allocation details page to display the stats bar chart even if the task was pending. [GH-14224]
    • vault: Fixed a bug where Vault clients were recreated when the server configuration was reloaded, even if there were no changes to the Vault configuration. [GH-14298]
    • vault: Fixed a bug where changing the Vault configuration namespace field was not detected as a change during server configuration reload. [GH-14298]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.3(Aug 5, 2022)

    1.3.3 (August 05, 2022)

    IMPROVEMENTS:

    • csi: Add stage_publish_base_dir field to csi_plugin block to support plugins that require a specific staging/publishing directory for mounts [GH-13919]
    • qemu: use shorter socket file names to reduce the chance of hitting the max path length [GH-13971]
    • template: Expose consul-template configuration options at the client level for nomad_retry. [GH-13907]
    • template: Templates support new uid/gid parameter pair [GH-13755]
    • ui: Reorder and apply the same style to the Evaluations list page filters to match the Job list page. [GH-13866]

    BUG FIXES:

    • acl: Fixed a bug where the timestamp for expiring one-time tokens was not deterministic between servers [GH-13737]
    • deployments: Fixed a bug that prevented auto-approval if canaries were marked as unhealthy during deployment [GH-14001]
    • metrics: Fixed a bug where blocked evals with no class produced no dc:class scope metrics [GH-13786]
    • namespaces: Fixed a bug that allowed deleting a namespace that contained a CSI volume [GH-13880]
    • qemu: restore the monitor socket path when restoring a QEMU task. [GH-14000]
    • servicedisco: Fixed a bug where non-unique services would escape job validation [GH-13869]
    • ui: Add missing breadcrumb in the Evaluations page. [GH-13865]
    • ui: Fixed a bug where task memory was reported as zero on systems using cgroups v2 [GH-13670]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.10(Aug 5, 2022)

    1.2.10 (August 05, 2022)

    BUG FIXES:

    • acl: Fixed a bug where the timestamp for expiring one-time tokens was not deterministic between servers [GH-13737]
    • deployments: Fixed a bug that prevented auto-approval if canaries were marked as unhealthy during deployment [GH-14001]
    • metrics: Fixed a bug where blocked evals with no class produced no dc:class scope metrics [GH-13786]
    • namespaces: Fixed a bug that allowed deleting a namespace that contained a CSI volume [GH-13880]
    • qemu: restore the monitor socket path when restoring a QEMU task. [GH-14000]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.16(Aug 5, 2022)

    1.1.16 (August 05, 2022)

    BUG FIXES:

    • acl: Fixed a bug where the timestamp for expiring one-time tokens was not deterministic between servers [GH-13737]
    • deployments: Fixed a bug that prevented auto-approval if canaries were marked as unhealthy during deployment [GH-14001]
    • namespaces: Fixed a bug that allowed deleting a namespace that contained a CSI volume [GH-13880]
    • qemu: restore the monitor socket path when restoring a QEMU task. [GH-14000]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.2(Jul 13, 2022)

    1.3.2 (July 13, 2022)

    IMPROVEMENTS:

    • agent: Added delete support to the eval HTTP API [GH-13492]
    • agent: emit a warning message if the agent starts with bootstrap_expect set to an even number. [GH-12961]
    • agent: logs are no longer buffered at startup when logging in JSON format [GH-13076]
    • api: enable setting ?choose parameter when querying services [GH-12862]
    • api: refactor ACL check when using the all namespaces wildcard in the job and alloc list endpoints [GH-13608]
    • api: support Authorization Bearer header in lieu of X-Nomad-Token header [GH-12534]
    • bootstrap: Added option to allow for an operator generated bootstrap token to be passed to the acl bootstrap command [GH-12520]
    • cli: Added delete command to the eval CLI [GH-13492]
    • cli: Added scheduler get-config and scheduler set-config commands to the operator CLI [GH-13045]
    • cli: always display job ID and namespace in the eval status command [GH-13581]
    • cli: display namespace and node ID in the eval list command and when eval status matches multiple evals [GH-13581]
    • cli: update default redis and use nomad service discovery [GH-13044]
    • client: added more fault tolerant defaults for template configuration [GH-13041]
    • core: Added the ability to pause and un-pause the eval broker and blocked eval broker [GH-13045]
    • core: On node updates skip creating evaluations for jobs not in the node's datacenter. [GH-12955]
    • core: automatically mark clients with recurring plan rejections as ineligible [GH-13421]
    • driver/docker: Eliminate excess Docker registry pulls for the infra_image when it already exists locally. [GH-13265]
    • fingerprint: add support for detecting kernel architecture of clients. (attribute: kernel.arch) [GH-13182]
    • hcl: added support for using the filebase64 function in jobspecs [GH-11791]
    • metrics: emit nomad.nomad.plan.rejection_tracker.node_score metric for the number of times a node had a plan rejection within the past time window [GH-13421]
    • qemu: add support for guest agent socket [GH-12800]
    • ui: Namespace filter query paramters are now isolated by route [GH-13679]

    BUG FIXES:

    • api: Fix listing evaluations with the wildcard namespace and an ACL token [GH-13530]
    • api: Fixed a bug where Consul token was not respected for job revert API [GH-13065]
    • cli: Fixed a bug in the names of the node drain and node status sub-commands [GH-13656]
    • cli: Fixed a bug where job validate did not respect vault token or namespace [GH-13070]
    • client: Fixed a bug where max_kill_timeout client config was ignored [GH-13626]
    • client: Fixed a bug where network.dns block was not interpolated [GH-12817]
    • cni: Fixed a bug where loopback address was not set for all drivers [GH-13428]
    • connect: Added missing ability of setting Connect upstream destination namespace [GH-13125]
    • core: Fixed a bug where an evicted batch job would not be rescheduled [GH-13205]
    • core: Fixed a bug where blocked eval resources were incorrectly computed [GH-13104]
    • core: Fixed a bug where reserved ports on multiple node networks would be treated as a collision. client.reserved.reserved_ports is now merged into each host_network's reserved ports instead of being treated as a collision. [GH-13651]
    • core: Fixed a bug where the plan applier could deadlock if leader's state lagged behind plan's creation index for more than 5 seconds. [GH-13407]
    • csi: Fixed a regression where a timeout was introduced that prevented some plugins from running by marking them as unhealthy after 30s by introducing a configurable health_timeout field [GH-13340]
    • csi: Fixed a scheduler bug where failed feasibility checks would return early and prevent processing additional nodes [GH-13274]
    • docker: Fixed a bug where cgroups-v1 parent was being set [GH-13058]
    • lifecycle: fixed a bug where sidecar tasks were not being stopped last [GH-13055]
    • state: Fix listing evaluations from all namespaces [GH-13551]
    • ui: Allow running jobs from a namespace-limited token [GH-13659]
    • ui: Fix a bug that prevented viewing the details of an evaluation in a non-default namespace [GH-13530]
    • ui: Fixed a bug that prevented the UI task exec functionality to work from behind a reverse proxy. [GH-12925]
    • ui: Fixed an issue where editing or running a job with a namespace via the UI would throw a 404 on redirect. [GH-13588]
    • ui: fixed a bug where links to jobs with "@" in their name would mis-identify namespace and 404 [GH-13012]
    • volumes: Fixed a bug where additions, updates, or removals of host volumes or CSI volumes were not treated as destructive updates [GH-13008]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.9(Jul 13, 2022)

    1.2.9 (July 13, 2022)

    BUG FIXES:

    • api: Fix listing evaluations with the wildcard namespace and an ACL token [GH-13552]
    • api: Fixed a bug where Consul token was not respected for job revert API [GH-13065]
    • cli: Fixed a bug in the names of the node drain and node status sub-commands [GH-13656]
    • client: Fixed a bug where max_kill_timeout client config was ignored [GH-13626]
    • client: Fixed a bug where network.dns block was not interpolated [GH-12817]
    • cni: Fixed a bug where loopback address was not set for all drivers [GH-13428]
    • connect: Added missing ability of setting Connect upstream destination namespace [GH-13125]
    • core: Fixed a bug where an evicted batch job would not be rescheduled [GH-13205]
    • core: Fixed a bug where blocked eval resources were incorrectly computed [GH-13104]
    • core: Fixed a bug where reserved ports on multiple node networks would be treated as a collision. client.reserved.reserved_ports is now merged into each host_network's reserved ports instead of being treated as a collision. [GH-13651]
    • core: Fixed a bug where the plan applier could deadlock if leader's state lagged behind plan's creation index for more than 5 seconds. [GH-13407]
    • csi: Fixed a regression where a timeout was introduced that prevented some plugins from running by marking them as unhealthy after 30s by introducing a configurable health_timeout field [GH-13340]
    • csi: Fixed a scheduler bug where failed feasibility checks would return early and prevent processing additional nodes [GH-13274]
    • lifecycle: fixed a bug where sidecar tasks were not being stopped last [GH-13055]
    • state: Fix listing evaluations from all namespaces [GH-13551]
    • ui: Allow running jobs from a namespace-limited token [GH-13659]
    • ui: Fixed a bug that prevented the UI task exec functionality to work from behind a reverse proxy. [GH-12925]
    • volumes: Fixed a bug where additions, updates, or removals of host volumes or CSI volumes were not treated as destructive updates [GH-13008]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.15(Jul 13, 2022)

    1.1.15 (July 13, 2022)

    BUG FIXES:

    • api: Fixed a bug where Consul token was not respected for job revert API [GH-13065]
    • cli: Fixed a bug in the names of the node drain and node status sub-commands [GH-13656]
    • client: Fixed a bug where max_kill_timeout client config was ignored [GH-13626]
    • cni: Fixed a bug where loopback address was not set for all drivers [GH-13428]
    • core: Fixed a bug where an evicted batch job would not be rescheduled [GH-13205]
    • core: Fixed a bug where reserved ports on multiple node networks would be treated as a collision. client.reserved.reserved_ports is now merged into each host_network's reserved ports instead of being treated as a collision. [GH-13651]
    • core: Fixed a bug where the plan applier could deadlock if leader's state lagged behind plan's creation index for more than 5 seconds. [GH-13407]
    • csi: Fixed a regression where a timeout was introduced that prevented some plugins from running by marking them as unhealthy after 30s by introducing a configurable health_timeout field [GH-13340]
    • csi: Fixed a scheduler bug where failed feasibility checks would return early and prevent processing additional nodes [GH-13274]
    • lifecycle: fixed a bug where sidecar tasks were not being stopped last [GH-13055]
    • ui: Allow running jobs from a namespace-limited token [GH-13659]
    • ui: Fixed a bug that prevented the UI task exec functionality to work from behind a reverse proxy. [GH-12925]
    • volumes: Fixed a bug where additions, updates, or removals of host volumes or CSI volumes were not treated as destructive updates [GH-13008]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(May 20, 2022)

    1.3.1 (May 19, 2022)

    SECURITY:

    • A vulnerability was identified in the go-getter library that Nomad uses for its artifacts such that a specially crafted Nomad jobspec can be used for privilege escalation onto client agent hosts. CVE-2022-30324 [GH-13057]

    BUG FIXES:

    • agent: fixed a panic on startup when the server.protocol_version config parameter was set [GH-12962]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.8(May 20, 2022)

    1.2.8 (May 19, 2022)

    SECURITY:

    • A vulnerability was identified in the go-getter library that Nomad uses for its artifacts such that a specially crafted Nomad jobspec can be used for privilege escalation onto client agent hosts. CVE-2022-30324 [GH-13057]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.14(May 20, 2022)

    1.1.14 (May 19, 2022)

    SECURITY:

    • A vulnerability was identified in the go-getter library that Nomad uses for its artifacts such that a specially crafted Nomad jobspec can be used for privilege escalation onto client agent hosts. CVE-2022-30324 [GH-13057]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(May 11, 2022)

    1.3.0 (May 11, 2022)

    FEATURES:

    • Edge compute improvements: Added support for reconnecting healthy allocations when disconnected clients reconnect. [GH-12476]
    • Native service discovery: Register and discover services using builtin simple service discovery. [GH-12368]

    BREAKING CHANGES:

    • agent: The state database on both clients and servers will automatically migrate its underlying database on startup. Downgrading to a previous version of an agent after upgrading it to Nomad 1.3 is not supported. [GH-12107]
    • client: The client state store will be automatically migrated to a new schema version when upgrading a client. Downgrading to a previous version of the client after upgrading it to Nomad 1.3 is not supported. To downgrade safely, users should erase the Nomad client's data directory. [GH-12078]
    • connect: Consul Service Identity ACL tokens automatically generated for Connect services are now created as Local rather than Global tokens. Nomad clusters with Connect services making cross-Consul datacenter requests will need to ensure their Consul agents are configured with anonymous ACL tokens of sufficient node and service read permissions. [GH-8068]
    • connect: The minimum Consul version supported by Nomad's Connect integration is now Consul v1.8.0. [GH-8068]
    • csi: The client filesystem layout for CSI plugins has been updated to correctly handle the lifecycle of multiple allocations serving the same plugin. Running plugin tasks will not be updated after upgrading the client, but it is recommended to redeploy CSI plugin jobs after upgrading the cluster. [GH-12078]
    • raft: The default raft protocol version is now 3 so you must follow the Upgrading to Raft Protocol 3 guide when upgrading an existing cluster to Nomad 1.3.0. Downgrading the raft protocol version is not supported. [GH-11572]

    SECURITY:

    • server: validate mTLS certificate names on agent to agent endpoints [GH-11956]

    IMPROVEMENTS:

    • agent: Switch from boltdb/bolt to go.etcd.io/bbolt [GH-12107]
    • api: Add related query parameter to the Evaluation details endpoint [GH-12305]
    • api: Add support for filtering and pagination to the jobs and volumes list endpoint [GH-12186]
    • api: Add support for filtering and pagination to the node list endpoint [GH-12727]
    • api: Add support for filtering, sorting, and pagination to the ACL tokens and allocations list endpoint [GH-12186]
    • api: Added ParseHCLOpts helper func to ease parsing HCLv1 jobspecs [GH-12777]
    • api: CSI secrets for list and delete snapshots are now passed in HTTP headers [GH-12144]
    • api: AllocFS.Logs now explicitly closes frames channel after being canceled [GH-12248]
    • api: default to using DefaultPooledTransport client to support keep-alive by default [GH-12492]
    • api: filter values of evaluation and deployment list api endpoints [GH-12034]
    • api: sort return values of evaluation and deployment list api endpoints by creation index [GH-12054]
    • build: make targets now respect GOBIN variable [GH-12077]
    • build: upgrade and speedup circleci configuration [GH-11889]
    • cli: Added -json flag to nomad job {run,plan,validate} to support parsing JSON formatted jobs [GH-12591]
    • cli: Added -os flag to node status to display operating system name [GH-12388]
    • cli: Added nomad operator api command to ease querying Nomad's HTTP API. [GH-10808]
    • cli: CSI secrets argument for volume snapshot list has been made consistent with volume snapshot delete [GH-12144]
    • cli: Return a redacted value for mount flags in the volume status command, instead of <none> [GH-12150]
    • cli: operator debug command now skips generating pprofs to avoid a panic on Nomad 0.11.2. 0.11.1, and 0.11.0 [GH-12807]
    • cli: add nomad config validate command to check configuration files without an agent [GH-9198]
    • cli: added -pprof-interval to nomad operator debug command [GH-11938]
    • cli: display the Raft version instead of the Serf protocol in the nomad server members command [GH-12317]
    • cli: rename the nomad server members -detailed flag to -verbose so it matches other commands [GH-12317]
    • client: Added NOMAD_SHORT_ALLOC_ID allocation env var [GH-12603]
    • client: Allow interpolation of the network.dns block [GH-12021]
    • client: Download up to 3 artifacts concurrently [GH-11531]
    • client: Enable support for cgroups v2 [GH-12274]
    • client: fingerprint AWS instance life cycle option [GH-12371]
    • client: set NOMAD_CPU_CORES environment variable when reserving cpu cores [GH-12496]
    • connect: automatically set alloc_id in envoy_stats_tags configuration [GH-12543]
    • connect: bootstrap envoy sidecars using -proxy-for [GH-12011]
    • consul/connect: write Envoy bootstrapping information to disk for debugging [GH-11975]
    • consul: Added implicit Consul constraint for task groups utilising Consul service and check registrations [GH-12602]
    • consul: add go-sockaddr templating support to nomad consul address [GH-12084]
    • consul: improve service name validation message to include maximum length requirement [GH-12012]
    • core: Enable configuring raft boltdb freelist sync behavior [GH-12107]
    • core: The unused protocol_version agent configuration value has been removed. [GH-11600]
    • csi: Add pagination parameters to volume snapshot list command [GH-12193]
    • csi: Added -secret and -parameter flags to volume snapshot create command [GH-12360]
    • csi: Added support for storage topology [GH-12129]
    • csi: Allow for concurrent plugin allocations [GH-12078]
    • csi: Allow volumes to be re-registered to be updated while not in use [GH-12167]
    • csi: Display plugin capabilities in nomad plugin status -verbose output [GH-12116]
    • csi: Respect the verbose flag in the output of volume status [GH-12153]
    • csi: Sort allocations in plugin status output [GH-12154]
    • csi: add flag for providing secrets as a set of key/value pairs to delete a volume [GH-11245]
    • csi: allow namespace field to be passed in volume spec [GH-12400]
    • deps: Update hashicorp/raft-boltdb to v2.2.0 [GH-12107]
    • deps: Update serf library to v0.9.7 [GH-12130]
    • deps: Updated hashicorp/consul-template to v0.29.0 [GH-12747]
    • deps: Updated hashicorp/raft to v1.3.5 [GH-12079]
    • deps: Upgrade kr/pty to creack/pty v1.1.5 [GH-11855]
    • deps: use gorilla package for gzip http handler [GH-11843]
    • drainer: defer draining CSI plugin jobs until system jobs are drained [GH-12324]
    • drivers/raw_exec: Add support for cgroups v2 in raw_exec driver [GH-12419]
    • drivers: removed support for restoring tasks created before Nomad 0.9 [GH-12791]
    • fingerprint: add support for detecting DigitalOcean environment [GH-12015]
    • metrics: Emit metrics regarding raft boltdb operations [GH-12107]
    • metrics: emit nomad.vault.token_last_renewal and nomad.vault.token_next_renewal metrics for Vault token renewal information [GH-12435]
    • namespaces: Allow adding custom metadata to namespaces. [GH-12138]
    • namespaces: Allow enabling/disabling allowed drivers per namespace. [GH-11807]
    • raft: The default raft protocol version is now 3. [GH-11572]
    • scheduler: Seed node shuffling with the evaluation ID to make the order reproducible [GH-12008]
    • scheduler: recover scheduler goroutines on panic [GH-12009]
    • server: Transfer Raft leadership in case the Nomad server fails to establish leadership [GH-12293]
    • server: store and check previous Raft protocol version to prevent downgrades [GH-12362]
    • services: Enable setting arbitrary address on Nomad or Consul service registration [GH-12720]
    • template: Upgraded to from consul-template v0.25.2 to v0.28.0 which includes the sprig library of functions and more. [GH-12312]
    • ui: added visual indicators for disconnected allocations and client nodes [GH-12544]
    • ui: break long service tags into multiple lines [GH-11995]
    • ui: change sort-order of evaluations to be reverse-chronological [GH-12847]
    • ui: make buttons with confirmation more descriptive of their actions [GH-12252]

    DEPRECATIONS:

    • Raft protocol version 2 is deprecated and will be removed in Nomad 1.4.0. [GH-11572]

    BUG FIXES:

    • api: Apply prefix filter when querying CSI volumes in all namespaces [GH-12184]
    • cleanup: prevent leaks from time.After [GH-11983]
    • client: Fixed a bug that could prevent a preempting alloc from ever starting. [GH-12779]
    • client: Fixed a bug where clients that retry blocking queries would not reset the correct blocking duration [GH-12593]
    • config: Fixed a bug where the reservable_cores setting was not respected [GH-12044]
    • core: Fixed auto-promotion of canaries in jobs with at least one task group without canaries. [GH-11878]
    • core: prevent malformed plans from crashing leader [GH-11944]
    • csi: Fixed a bug where plugin status commands could choose the incorrect plugin if a plugin with a name that matched the same prefix existed. [GH-12194]
    • csi: Fixed a bug where volume snapshot list did not correctly filter by plugin IDs. The -plugin parameter is required. [GH-12197]
    • csi: Fixed a bug where allocations with volume claims would fail their first placement after a reschedule [GH-12113]
    • csi: Fixed a bug where allocations with volume claims would fail to restore after a client restart [GH-12113]
    • csi: Fixed a bug where creating snapshots required a plugin ID instead of falling back to the volume's plugin ID [GH-12195]
    • csi: Fixed a bug where fields were missing from the Read Volume API response [GH-12178]
    • csi: Fixed a bug where garbage collected nodes would block releasing a volume [GH-12350]
    • csi: Fixed a bug where per-alloc volumes used the incorrect ID when querying for alloc status -verbose [GH-12573]
    • csi: Fixed a bug where plugin configuration updates were not considered destructive [GH-12774]
    • csi: Fixed a bug where plugins would not restart if they failed any time after a client restart [GH-12752]
    • csi: Fixed a bug where plugins written in NodeJS could fail to fingerprint [GH-12359]
    • csi: Fixed a bug where purging a job with a missing plugin would fail [GH-12114]
    • csi: Fixed a bug where single-use access modes were not enforced during validation [GH-12337]
    • csi: Fixed a bug where the maximum number of volume claims was incorrectly enforced when an allocation claims a volume [GH-12112]
    • csi: Fixed a bug where the plugin instance manager would not retry the initial gRPC connection to plugins [GH-12057]
    • csi: Fixed a bug where the plugin supervisor would not restart the task if it failed to connect to the plugin [GH-12057]
    • csi: Fixed a bug where volume snapshot timestamps were always zero values [GH-12352]
    • csi: Fixed bug where accessing plugins was subject to a data race [GH-12553]
    • csi: fixed a bug where volume detach, volume deregister, and volume status commands did not accept an exact ID if multiple volumes matched the prefix [GH-12051]
    • csi: provide CSI_ENDPOINT environment variable to plugin tasks [GH-12050]
    • jobspec: Fixed a bug where connect sidecar resources were ignored when using HCL1 [GH-11927]
    • lifecycle: Fixed a bug where successful poststart tasks were marked as unhealthy [GH-11945]
    • recommendations (Enterprise): Fixed a bug where the recommendations list RPC incorrectly forwarded requests to the authoritative region [GH-12040]
    • scheduler: fixed a bug where in-place updates on ineligible nodes would be ignored [GH-12264]
    • server: Write peers.json file with correct permissions [GH-12369]
    • template: Fixed a bug preventing allowing all consul-template functions. [GH-12312]
    • template: Fixed a bug where the default function_denylist would be appended to a specified list [GH-12071]
    • ui: Fix the link target for CSI volumes on the task detail page [GH-11896]
    • ui: Fixed a bug where volumes were being incorrectly linked when per_alloc=true [GH-12713]
    • ui: fix broken link to task-groups in the Recent Allocations table in the Job Detail overview page. [GH-12765]
    • ui: fix the unit for the task row memory usage metric [GH-11980]
    Source code(tar.gz)
    Source code(zip)
  • v1.2.7(May 11, 2022)

    1.2.7 (May 10, 2022)

    SECURITY:

    • server: validate mTLS certificate names on agent to agent endpoints [GH-11956]

    IMPROVEMENTS:

    • build: upgrade and speedup circleci configuration [GH-11889]

    BUG FIXES:

    • Fixed a bug where successful poststart tasks were marked as unhealthy [GH-11945]
    • api: Apply prefix filter when querying CSI volumes in all namespaces [GH-12184]
    • cleanup: prevent leaks from time.After [GH-11983]
    • client: Fixed a bug that could prevent a preempting alloc from ever starting. [GH-12779]
    • client: Fixed a bug where clients that retry blocking queries would not reset the correct blocking duration [GH-12593]
    • config: Fixed a bug where the reservable_cores setting was not respected [GH-12044]
    • core: Fixed auto-promotion of canaries in jobs with at least one task group without canaries. [GH-11878]
    • core: prevent malformed plans from crashing leader [GH-11944]
    • csi: Fixed a bug where plugin status commands could choose the incorrect plugin if a plugin with a name that matched the same prefix existed. [GH-12194]
    • csi: Fixed a bug where volume snapshot list did not correctly filter by plugin IDs. The -plugin parameter is required. [GH-12197]
    • csi: Fixed a bug where allocations with volume claims would fail their first placement after a reschedule [GH-12113]
    • csi: Fixed a bug where allocations with volume claims would fail to restore after a client restart [GH-12113]
    • csi: Fixed a bug where creating snapshots required a plugin ID instead of falling back to the volume's plugin ID [GH-12195]
    • csi: Fixed a bug where fields were missing from the Read Volume API response [GH-12178]
    • csi: Fixed a bug where garbage collected nodes would block releasing a volume [GH-12350]
    • csi: Fixed a bug where per-alloc volumes used the incorrect ID when querying for alloc status -verbose [GH-12573]
    • csi: Fixed a bug where plugin configuration updates were not considered destructive [GH-12774]
    • csi: Fixed a bug where plugins would not restart if they failed any time after a client restart [GH-12752]
    • csi: Fixed a bug where plugins written in NodeJS could fail to fingerprint [GH-12359]
    • csi: Fixed a bug where purging a job with a missing plugin would fail [GH-12114]
    • csi: Fixed a bug where single-use access modes were not enforced during validation [GH-12337]
    • csi: Fixed a bug where the maximum number of volume claims was incorrectly enforced when an allocation claims a volume [GH-12112]
    • csi: Fixed a bug where the plugin instance manager would not retry the initial gRPC connection to plugins [GH-12057]
    • csi: Fixed a bug where the plugin supervisor would not restart the task if it failed to connect to the plugin [GH-12057]
    • csi: Fixed a bug where volume snapshot timestamps were always zero values [GH-12352]
    • csi: Fixed bug where accessing plugins was subject to a data race [GH-12553]
    • csi: fixed a bug where volume detach, volume deregister, and volume status commands did not accept an exact ID if multiple volumes matched the prefix [GH-12051]
    • csi: provide CSI_ENDPOINT environment variable to plugin tasks [GH-12050]
    • jobspec: Fixed a bug where connect sidecar resources were ignored when using HCL1 [GH-11927]
    • scheduler: fixed a bug where in-place updates on ineligible nodes would be ignored [GH-12264]
    • ui: Fix the link target for CSI volumes on the task detail page [GH-11896]
    • ui: fix the unit for the task row memory usage metric [GH-11980]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.13(May 11, 2022)

    1.1.13 (May 10, 2022)

    SECURITY:

    • server: validate mTLS certificate names on agent to agent endpoints [GH-11956]

    IMPROVEMENTS:

    • api: Updated the CSI volumes list API to respect wildcard namespaces [GH-11724]
    • build: upgrade and speedup circleci configuration [GH-11889]

    BUG FIXES:

    • Fixed a bug where successful poststart tasks were marked as unhealthy [GH-11945]
    • api: Apply prefix filter when querying CSI volumes in all namespaces [GH-12184]
    • cleanup: prevent leaks from time.After [GH-11983]
    • client: Fixed a bug that could prevent a preempting alloc from ever starting. [GH-12779]
    • client: Fixed a bug where clients that retry blocking queries would not reset the correct blocking duration [GH-12593]
    • config: Fixed a bug where the reservable_cores setting was not respected [GH-12044]
    • core: Fixed auto-promotion of canaries in jobs with at least one task group without canaries. [GH-11878]
    • core: prevent malformed plans from crashing leader [GH-11944]
    • csi: Fixed a bug where plugin status commands could choose the incorrect plugin if a plugin with a name that matched the same prefix existed. [GH-12194]
    • csi: Fixed a bug where volume snapshot list did not correctly filter by plugin IDs. The -plugin parameter is required. [GH-12197]
    • csi: Fixed a bug where allocations with volume claims would fail their first placement after a reschedule [GH-12113]
    • csi: Fixed a bug where allocations with volume claims would fail to restore after a client restart [GH-12113]
    • csi: Fixed a bug where creating snapshots required a plugin ID instead of falling back to the volume's plugin ID [GH-12195]
    • csi: Fixed a bug where fields were missing from the Read Volume API response [GH-12178]
    • csi: Fixed a bug where garbage collected nodes would block releasing a volume [GH-12350]
    • csi: Fixed a bug where per-alloc volumes used the incorrect ID when querying for alloc status -verbose [GH-12573]
    • csi: Fixed a bug where plugin configuration updates were not considered destructive [GH-12774]
    • csi: Fixed a bug where plugins would not restart if they failed any time after a client restart [GH-12752]
    • csi: Fixed a bug where plugins written in NodeJS could fail to fingerprint [GH-12359]
    • csi: Fixed a bug where purging a job with a missing plugin would fail [GH-12114]
    • csi: Fixed a bug where single-use access modes were not enforced during validation [GH-12337]
    • csi: Fixed a bug where the maximum number of volume claims was incorrectly enforced when an allocation claims a volume [GH-12112]
    • csi: Fixed a bug where the plugin instance manager would not retry the initial gRPC connection to plugins [GH-12057]
    • csi: Fixed a bug where the plugin supervisor would not restart the task if it failed to connect to the plugin [GH-12057]
    • csi: Fixed a bug where volume snapshot timestamps were always zero values [GH-12352]
    • csi: Fixed bug where accessing plugins was subject to a data race [GH-12553]
    • csi: fixed a bug where volume detach, volume deregister, and volume status commands did not accept an exact ID if multiple volumes matched the prefix [GH-12051]
    • csi: provide CSI_ENDPOINT environment variable to plugin tasks [GH-12050]
    • jobspec: Fixed a bug where connect sidecar resources were ignored when using HCL1 [GH-11927]
    • scheduler: fixed a bug where in-place updates on ineligible nodes would be ignored [GH-12264]
    • ui: Fix the link target for CSI volumes on the task detail page [GH-11896]
    • ui: fix the unit for the task row memory usage metric [GH-11980]
    Source code(tar.gz)
    Source code(zip)
Owner
HashiCorp
Consistent workflows to provision, secure, connect, and run any infrastructure for any application.
HashiCorp
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

kube-batch kube-batch is a batch scheduler for Kubernetes, providing mechanisms for applications which would like to run batch jobs leveraging Kuberne

Kubernetes SIGs 1k Nov 14, 2022
Deploy https certificates non-interactively to CDN services

certdeploy Deploy https certificates non-interactively to CDN services. Environment Variables CERT_PATH - Certificate file path, should contain certif

三三 1 Nov 12, 2021
Natural-deploy - A natural and simple way to deploy workloads or anything on other machines.

Natural Deploy Its Go way of doing Ansibles: Motivation: Have you ever felt when using ansible or any declarative type of program that is used for dep

Akilan Selvacoumar 0 Jan 3, 2022
The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your container orchestrator

fortress-csi The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your co

Fortress 0 Jan 23, 2022
Kubernetes is an open source system for managing containerized applications across multiple hosts.

Kubernetes Kubernetes is an open source system for managing containerized applications across multiple hosts. It provides basic mechanisms for deploym

null 0 Nov 25, 2021
Fleex allows you to create multiple VPS on cloud providers and use them to distribute your workload.

Fleex allows you to create multiple VPS on cloud providers and use them to distribute your workload. Run tools like masscan, puredns, ffuf, httpx or a

null 174 Nov 17, 2022
Workflow Orchestrator

Adagio - A Workflow Orchestrator This project is currently in a constant state of flux. Don't expect it to work. Thank you o/ Adagio is a workflow exe

George 88 Sep 2, 2022
Orchestrator Service - golang

Orchestrator Service - golang Prerequisites golang protoc compiler Code Editor (for ex. VS Code) Postman BloomRPC About Operating System Used for Deve

MEET PATEL 1 Feb 15, 2022
A Simple Orchestrator Service implemented using gRPC in Golang

Orchestrator Service The goal of this program is to build an orchestrator service that would read any request it receives and forwards it to other orc

Mayank Pandey 2 Apr 5, 2022
Ensi-local-ctl - ELC - orchestrator of development environments

ELC - orchestrator of development environments With ELC you can: start a couple

MadridianFox 1 Oct 13, 2022
Deploy, manage, and secure applications and resources across multiple clusters using CloudFormation and Shipa

CloudFormation provider Deploy, secure, and manage applications across multiple clusters using CloudFormation and Shipa. Development environment setup

Shipa 1 Feb 12, 2022
Build and deploy Go applications on Kubernetes

ko: Easy Go Containers ko is a simple, fast container image builder for Go applications. It's ideal for use cases where your image contains a single G

Google 5.3k Nov 23, 2022
Easily deploy your Go applications with Dokku.

dokku-go-example Easily deploy your Go applications with Dokku. Features: Deploy on your own server Auto deployment HTTPS Check the full step by step

null 10 Aug 21, 2022
Small and easy server for web-hooks to deploy software on push from gitlab/github/hg and so on

Deployment mini-service This mini web-server is made to deploy your code without yaml-files headache. If you just need to update your code somewhere a

Roman Usachev 10 Jul 7, 2022
DigitalOcean Droplets target plugin for HashiCorp Nomad Autoscaler

Nomad DigitalOcean Droplets Autoscaler The do-droplets target plugin allows for the scaling of the Nomad cluster clients via creating and destroying D

Johan Siebens 39 Oct 24, 2022
The Operator Pattern, in Nomad

Nomad Operator Example Repostiory to go along with my The Operator Pattern in Nomad blog post. Usage If you have tmux installed, you can run start.sh

Andy Davies 9 May 12, 2022
A simple Go app and GitHub workflow that shows how to use GitHub Actions to test, build and deploy a Go app to Docker Hub

go-pipeline-demo A repository containing a simple Go app and GitHub workflow that shows how to use GitHub Actions to test, build and deploy a Go app t

Marat Bogatyrev 0 Nov 17, 2021
Use Terraform to build and deploy configurations for Juniper SRX firewalls.

Juniper Terraform - SRX Overview The goal of this project is to provide an example method to interact with Juniper SRX products with Terraform. ?? Ter

Calvin Remsburg 1 Mar 16, 2022
This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

null 0 Jan 18, 2022