Open-metrics endpoint collector for ONTAP

Overview

NetApp Harvest 2.0

The swiss-army knife for monitoring datacenters. The default package collects performance, capacity and hardware metrics from ONTAP clusters. New metrics can be collected by editing the config files. Metrics can be delivered to Prometheus and InfluxDB databases - and displayed in Grafana dashboards.

Harvest's architecture is flexible in how it collects, augments, and exports data. Think of it as a framework for running collectors and exporters concurrently. You are more than welcome to contribute your own collector, plugin or exporter (start with our ARCHITECTURE.md).

Requirements

Harvest is written in Go, which means it runs on recent Linux systems. It also runs on Macs, but the process isn't as smooth yet.

Optional prerequisites:

  • dialog or whiptail (used by the config utility)
  • openssl (used by config)

Hardware requirements depend on how many clusters you monitor and the number of metrics you chose to collect. With the default configuration, when monitoring 10 clusters, we recommend:

  • CPU: 2 cores
  • Memory: 1 GB
  • Disk: 500 MB (mostly used by log files)

Harvest is compatible with:

  • Prometheus: 2.24 or higher
  • InfluxDB: v2
  • Grafana: 7.4.2 or higher

Installation / Upgrade

We provide pre-compiled binaries for Linux, RPMs, and Debs.

Pre-compiled Binaries

Visit the Releases page and copy the tar.gz link you want to download. For example, to download the v21.05.1 release:

wget https://github.com/NetApp/harvest/releases/download/v21.05.1/harvest-21.05.1-1.tar.gz
tar -xvf harvest-21.05.1-1.tar.gz
cd harvest-21.05.1-1

# Run Harvest with the default unix localhost collector
bin/harvest start

If you don't have wget installed, you can use curl like so:

curl -L -O https://github.com/NetApp/harvest/releases/download/v21.05.1/harvest-21.05.1-1.tar.gz

Redhat

Installation of the Harvest package may require root or administrator privileges

Download the latest rpm of Harvest from the releases tab and install with yum.

  $ sudo yum install harvest.XXX.rpm

Once the installation has finished, edit the harvest.yml configuration file located in /opt/harvest/harvest.yml

After editing /opt/harvest/harvest.yml, manage Harvest with systemctl start|stop|restart harvest

Changes install makes

  • Directories /var/log/harvest/ and /var/log/run/ are created
  • A harvest user and group are created and the installed files are chowned to harvest
  • Systemd /etc/systemd/system/harvest.service file is created and enabled

Debian

Installation of the Harvest package may require root or administrator privileges

Download the latest deb of Harvest from the releases tab and install with apt.

  $ sudo apt install ./harvest-<RELEASE>.amd64.deb

Once the installation has finished, edit the harvest.yml configuration file located in /opt/harvest/harvest.yml

After editing /opt/harvest/harvest.yml, manage Harvest with systemctl start|stop|restart harvest

Changes install makes

  • Directories /var/log/harvest/ and /var/log/run/ are created
  • A harvest user and group are created and the installed files are chowned to harvest
  • Systemd /etc/systemd/system/harvest.service file is created and enabled

Docker

Work in progress. Coming soon

Building from source

To build Harvest from source code, first make sure you have a working Go environment with version 1.15 or greater installed. You'll also need an Internet connection to install go dependencies. If you need to build from an air-gapped machine, use go mod vendor from an Internet connected machine first, and then copy the vendor directory to the air-gapped machine.

Clone the repo and build everything.

git clone https://github.com/NetApp/harvest.git
cd harvest
make
bin/harvest version

If you're building on a Mac use GOOS=darwin make build

Checkout the Makefile for other targets of interest.

Quick start

1. Configuration file

Harvest's configuration information is defined in harvest.yml. There are a few ways to tell Harvest how to load this file:

  • If you don't use the --config flag, the harvest.yml file located in the current working directory will be used

  • If you specify the --config flag like so harvest status --config /opt/harvest/harvest.yml, Harvest will use that file

To start collecting metrics, you need to define at least one poller and one exporter in your configuration file. The default configuration comes with a pre-configured poller named unix which collects metrics from the local system. This is useful if you want to monitor resource usage by Harvest and serves as a good example. Feel free to delete it if you want.

The next step is to add pollers for your ONTAP clusters in the Pollers section of the configuration file. Refer to the Harvest Configuration Section for more details.

2. Start Harvest

Start all Harvest pollers as daemons:

$ bin/harvest start

Or start a specific poller(s):

$ bin/harvest start jamaica grenada

Replace jamaica and grenada with the poller names you defined in harvest.yml. The logs of each poller can be found in /var/log/harvest/.

3. Import Grafana dashboards

The Grafana dashboards are located in the $HARVEST_HOME/grafana directory. You can manually import the dashboards or use the harvest grafana command (more documentation).

Note: the current dashboards specify Prometheus as the datasource. If you use the InfluxDB exporter, you will need to create your own dashboards.

4. Verify the metrics

If you use a Prometheus Exporter, open a browser and navigate to http://0.0.0.0:12990/ (replace 12990 with the port number of your poller). This is the Harvest created HTTP end-point for your Prometheus exporter. This page provides a real-time generated list of running collectors and names of exported metrics.

The metric data that's exposed for Prometheus to scrap is available at http://0.0.0.0:12990/metrics/. For more help on how to configure Prometheus DB, see the Prometheus exporter documentation.

If you can't access the URL, check the logs of your pollers. These are located in /var/log/harvest/.

Harvest Configuration

The main configuration file, harvest.yml, consists of the following sections, described below:

Pollers

All pollers are defined in harvest.yml, the main configuration file of Harvest, under the section Pollers.

parameter type description default
Poller name (header) required poller name, user-defined value
datacenter required datacenter name, user-defined value
addr required by some collectors IPv4 or FQDN of the target system
collectors required list of collectors to run for this poller
exporters required list of exporter names from the Exporters section. Note: this should be the name of the exporter (e.g. prometheus1), not the value of the exporter key (e.g. Prometheus)
auth_style required by Zapi* collectors either basic_auth or certificate_auth basic_auth
username, password required if auth_style is basic_auth
ssl_cert, ssl_key optional if auth_style is certificate_auth Absolute paths to SSL (client) certificate and key used to authenticate with the target system.

If not provided, the poller will look for <hostname>.key and <hostname>.pem in $HARVEST_HOME/cert/.

To create certificates for ONTAP systems, see the Zapi documentation
use_insecure_tls optional, bool If true, disable TLS verification when connecting to ONTAP cluster false
log_max_bytes Maximum size of the log file before it will be rotated 10000000 (10 mb)
log_max_files Number of rotated log files to keep 10

Defaults

This section is optional. If there are parameters identical for all your pollers (e.g. datacenter, authentication method, login preferences), they can be grouped under this section. The poller section will be checked first and if the values aren't found there, the defaults will be consulted.

Exporters

All exporters need two types of parameters:

  • exporter parameters - defined in harvest.yml under Exporters section
  • export_options - these options are defined in the Matrix datastructure that is emitted from collectors and plugins

The following two parameters are required for all exporters:

parameter type description default
Exporter name (header) required Name of the exporter instance, this is a user-defined value
exporter required Name of the exporter class (e.g. Prometheus, InfluxDB, Http) - these can be found under the cmd/exporters/ directory

Note: when we talk about the Prometheus Exporter or InfluxDB Exporter, we mean the Harvest modules that send the data to a database, NOT the names used to refer to the actual databases.

Prometheus Exporter

InfluxDB Exporter

Tools

This section is optional. You can uncomment the grafana_api_token key and add your Grafana API token so harvest does not prompt you for the key when importing dashboards.

Tools:
  #grafana_api_token: 'aaa-bbb-ccc-ddd'

Configuring collectors

Collectors are configured by their own configuration files, which are subdirectories in conf/. Each collector can define its own set of parameters.

Zapi

ZapiPerf

Unix

Comments
  • Pollers aren't running on fresh Docker install

    Pollers aren't running on fresh Docker install

    Describe the bug Pollers aren't running

    Environment Provide accurate information about the environment to help us reproduce the issue.

    • Harvest version: [copy paste output of harvest --version] harvest version 21.08.0-6 (commit 485d191) (build date 2021-08-31T11:51:03-0400) linux/amd64
    • Command line arguments used: [e.g. bin/harvest start --config=foo.yml --collectors Zapi] started docker per the easy-start page
    • OS: [e.g. RHEL 7.6, Ubuntu 21.04, Docker 19.03.1-CE] RedHat 8.4
    • Install method: [yum, rhel, native, docker] docker
    • ONTAP Version: [e.g. 9.7, 9.8, 9.9, 7mode] 9.7P7
    • Other:

    To Reproduce Installed\setup to use docker. Started the containers per the docs, I can see them download & start but the pollers never report in. checking them with ./bin/harvest status reports that the pollers are there but not running.

    Expected behavior pollers would be running, I would see my datacenters & clusters in Grafana

    Actual behavior pollers are not running, only "local" listed in Grafana

    Possible solution, workaround, fix Tell us if there is a workaround or fix

    Additional context Add any other context about the problem here. Share logs, error output, etc.

    customer 
    opened by survive-wustl 38
  • Qtree 7mode Template is not available in this repository

    Qtree 7mode Template is not available in this repository

    Hi, I did not find qtree 7mode template. Should I build one myself with qtree-list api? Or there is one built already but this repository is not updated?

    feature status/done 
    opened by jmg011 26
  • Storage type metrics is missing

    Storage type metrics is missing

    Hi, We saw that there is not option to pull metrics about disk type(HHD,SSD). Can you add a metric that will gather info about the disks and output the size the type and etc.

    feature status/done 
    opened by bengoldenberg 26
  • Qtree metrics are intermittently showing 0 value

    Qtree metrics are intermittently showing 0 value

    qtree_disk_limit is showing intermittent 0 and correct value. qtree_disk_used is showing intermittent 0 and correct value.

    An example of a qtree_disk_limit

    image

    An example of a qtree_disk_used

    image

    status/done customer 
    opened by jmg011 22
  • Add a StorageGRID collector

    Add a StorageGRID collector

    As a storage administrator, I'm looking to have a single tool to monitor my NetApp environment. ActiveIQ Unified Manager, Harvest and NABox are greats tools but focused only to ONTAP. It will be great if we can have StorageGRID monitoring inside Harvest.

    feature priority/P2 status/done 
    opened by rvalab 22
  • Making dashboards compatible with grafana-API

    Making dashboards compatible with grafana-API

    We want to import grafana-dashboards with Grafana own API. Since we make some specific modification on our dashboard (own variables for region like in the screenshot for example) it would be nice to support the own Grafana-API as well.

    From my current perspective it should be only needed to add some specific identifiers to get them compatible (importable): URL

    Expected behavior Grafana Dashboard should be importable via grafana-API

    Actual behavior own modifications to json-files are needed - afterwards dashboard is empty (not sure if this is maybe a self-made error)

    opened by florianmulatz 19
  • Harvest should collect environment sensor info

    Harvest should collect environment sensor info

    Harvest doesnt collect any chassis sensor info

    Describe the solution you'd like Harvest doesnt collect any chassis sensor info, there are a few important scenarios where having this information is really useful, especially NVMEM battery status which when failed causes a shutdown of a cluster in 24hrs.

    Describe alternatives you've considered N/A

    Additional context This information is available from the environment-sensors-get-iter api, should be fairly easy to map values.

    Example data

    critical-low-threshold  : 1086
    node-name               : node
    sensor-name             : PVDDQ DDR4 AB
    sensor-type             : voltage
    threshold-sensor-state  : normal
    threshold-sensor-value  : 1228
    value-units             : mV
    warning-high-threshold  : 1299
    warning-low-threshold   : 1100
    
    feature status/done 
    opened by hashi825 19
  • Getting 408 Request Timeout" when running in K8s">

    Getting "error reading api response => 408 Request Timeout" when running in K8s

    Describe the bug We have multiple pollers running in k8s. One of them is working fine on local laptops under docker but in K8s throws 408 errors.

    Environment Provide accurate information about the environment to help us reproduce the issue.

    • Harvest version: 21.05.3-2
    • Command line arguments used: See below
    • OS: Kubernetes
    • Install method: docker
    • ONTAP Version: 9.3P4

    To Reproduce

    Deployment:

    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: CLUSTER_NAME 
      namespace: netapp-metrics
      labels:
        app: CLUSTER_NAME 
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: CLUSTER_NAME 
      template:
        metadata:
          labels:
            app: CLUSTER_NAME 
        spec:
          nodeSelector:
            topology.kubernetes.io/zone: ZONE_NAME
          volumes:
            - name: zapi-default-config
              configMap:
                name: zapi-default-config
    
            - name: harvest-poller-config
              secret:
                secretName: harvest-poller-config
    
          containers:
          - name: CLUSTER_NAME 
            image: REGISTRY/harvest:21.05.3-2
            args:
            - "--config"
            - "/harvest.yml"
            - "--poller"
            - "CLUSTER_NAME "
            resources:
              limits:
                memory: 2G
                cpu: 1000m
              requests:
                memory: 1G
                cpu: 500m
            ports:
            - name: http
              containerPort: 12990
            volumeMounts:
              - name: zapi-default-config
                mountPath: /opt/harvest/conf/zapi/default.yaml
                subPath: default.yaml
              - name: harvest-poller-config
                mountPath: /harvest.yml
                subPath: harvest.yml
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: CLUSTER_NAME 
      name: CLUSTER_NAME 
      namespace: netapp-metrics
    spec:
      ports:
      - port: 12990
        targetPort: 12990
        name: metrics
      selector:
        app: CLUSTER_NAME 
      type: ClusterIP
    ---
    kind: ServiceMonitor
    apiVersion: monitoring.coreos.com/v1
    metadata:
      name: CLUSTER_NAME 
      namespace: netapp-metrics
      labels:
        prometheus: netapp
        prometheusEnv: prd
    spec:
      selector:
        matchLabels:
          app: CLUSTER_NAME 
      endpoints:
      - port: metrics
        interval: 1m
        path: /metrics
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: CLUSTER_NAME -perf
      namespace: netapp-metrics
      labels:
        app: CLUSTER_NAME -perf
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: CLUSTER_NAME -perf
      template:
        metadata:
          labels:
            app: CLUSTER_NAME -perf
        spec:
          nodeSelector:
            topology.kubernetes.io/zone: ZONE_NAME
          volumes:
            - name: zapi-default-config
              configMap:
                name: zapi-default-config
    
            - name: harvest-poller-config
              secret:
                secretName: harvest-poller-config
    
          containers:
          - name: CLUSTER_NAME -perf
            image: REGISTRY/harvest:21.05.3-2
            args:
            - "--config"
            - "/harvest.yml"
            - "--poller"
            - "CLUSTER_NAME -perf"
            resources:
              limits:
                memory: 2G
                cpu: 1000m
              requests:
                memory: 1G
                cpu: 500m
            ports:
            - name: http
              containerPort: 12990
            volumeMounts:
              - name: zapi-default-config
                mountPath: /opt/harvest/conf/zapi/default.yaml
                subPath: default.yaml
              - name: harvest-poller-config
                mountPath: /harvest.yml
                subPath: harvest.yml
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: CLUSTER_NAME -perf
      name: CLUSTER_NAME -perf
      namespace: netapp-metrics
    spec:
      ports:
      - port: 12990
        targetPort: 12990
        name: metrics
      selector:
        app: CLUSTER_NAME -perf
      type: ClusterIP
    ---
    kind: ServiceMonitor
    apiVersion: monitoring.coreos.com/v1
    metadata:
      name: CLUSTER_NAME -perf
      namespace: netapp-metrics
      labels:
        prometheus: netapp
        prometheusEnv: prd
    spec:
      selector:
        matchLabels:
          app: CLUSTER_NAME -perf
      endpoints:
      - port: metrics
        interval: 1m
        path: /metrics
    
    

    Poller Config:

    netapp-harvest/docker/harvest.yml

    ---
    Pollers:
      CLUSTER_NAME:
        datacenter: DATACENTER_NAME
        collectors:
          - Zapi
        addr: CLUSTER_NAME
        exporters:
          - promethues
      CLUSTER_NAME-perf:
        datacenter: DATACENTER_NAME
        collectors:
          - Zapiperf
        addr: CLUSTER_NAME
        exporters:
          - promethues
    
    

    default.yaml:

    netapp-harvest/docker/default.yaml

    ---
    collector:          Zapi
    
    # Order here matters!
    schedule:
      - instance: 300s
      - data: 50s
    
    client_timeout: 40
    
    objects:
      Node:             node.yaml
      Aggregate:        aggr.yaml
      Volume:           volume.yaml
      SnapMirror:       snapmirror.yaml
      Disk:             disk.yaml
      Shelf:            shelf.yaml
      Status:           status.yaml
      Subsystem:        subsystem.yaml
      Lun:              lun.yaml
    
    

    Expected behavior Poller should collect metrics

    Actual behavior

    5:50PM ERR goharvest2/cmd/poller/collector/collector.go:303 >  error="error reading api response => 408 Request Timeout" Poller=CLUSTER_NAME collector=Zapi:Aggregate stack=[{"func":"New","line":"35","source":"errors.go"},{"func":"(*Client).invoke","line":"369","source":"client.go"},{"func":"(*Client).InvokeBatchWithTimers","line":"281","source":"client.go"},{"func":"(*Zapi).PollData","line":"350","source":"zapi.go"},{"func":"(*task).Run","line":"60","source":"schedule.go"},{"func":"(*AbstractCollector).Start","line":"269","source":"collector.go"},{"func":"goexit","line":"1371","source":"asm_amd64.s"}]
    
    
    opened by chvvkumar 19
  • Collecting Prometheus Histogram Stats with Harvest

    Collecting Prometheus Histogram Stats with Harvest

    Is your feature request related to a problem? Please describe. We have a requirement to monitor some stats that are of type histogram. However, after enabling them in harvest, we have noticed that the stat is not formatted in a Prometheus histogram format. This means the Prometheus function call "histogram_quantile" does not work. At the moment, histogram stats seem unusable in their current format (as far as we can tell).

    Describe the solution you'd like Format the histograms collected from Harvest in a Prometheus histogram format. There are 3 stats generated in a Prometheus histogram from one Netapp histogram collected.

    1. <stat_name>_bucket{le="<bucket_name>"}
    2. <stat_name>_sum
    3. <stat_name>_count

    Details about each can be found here: Prometheus Histogram

    Currently, histograms get formatted like this:

    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<90s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<10s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<200us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<600ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<14ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<100ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<8s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<1s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<20s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<4ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<100us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<60us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<2ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<1ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<6s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<30s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<6ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<60ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<600us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<4s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<80us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric=">=120s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<200ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<800us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<2s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<800ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<20ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<400ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<18ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<120s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<8ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<60s"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<40us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<16ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<12ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<80ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<400us"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<10ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<40ms"} 0
    svm_cifs_cifs_latency_hist{datacenter="DC1",cluster="cluster1",svm="svm1",metric="<20us"} 0
    

    This same stat would get formatted like this to match the Prometheus scheme:

    # HELP netapp_perf_cifs_vserver_cifs_latency_hist netapp_perf_cifs_vserver_cifs_latency_hist cifs_latency_hist
    # TYPE netapp_perf_cifs_vserver_cifs_latency_hist histogram
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.02"} 1439
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.04"} 2579
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.06"} 2607
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.08"} 2608
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.1"} 2611
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.2"} 2759
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.4"} 2908
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.6"} 3108
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="0.8"} 3119
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="1"} 3126
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="2"} 3143
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="4"} 3172
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="6"} 3179
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="8"} 3215
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="10"} 3220
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="12"} 3228
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="14"} 3232
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="16"} 3235
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="18"} 3238
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="20"} 3242
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="40"} 3253
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="60"} 3256
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="80"} 3256
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="100"} 3257
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="200"} 3262
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="400"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="600"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="800"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="1000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="2000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="4000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="6000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="8000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="10000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="20000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="30000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="60000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="90000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="120000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="240000"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_bucket{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1",le="+Inf"} 3265
    netapp_perf_cifs_vserver_cifs_latency_hist_sum{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1"} 4085.44
    netapp_perf_cifs_vserver_cifs_latency_hist_count{cluster_name="cluster1",instance="collector:25035",instance_name="svm1",vserver_name="svm1"} 3265
    

    To make this work, the label "metric" harvest is collecting would need to be converted to a number value and made into a bucket label "le".

    Example: metric="<20us" => le="0.02" (standarizing all buckets to ms)

    Describe alternatives you've considered We currently can collect these histograms in an internal perf collection tool, but we would like to transfer all collections to harvest since it is way better in so many ways.

    feature status/done customer 22.11 
    opened by rodenj1 17
  • Flexgroup Read Latency

    Flexgroup Read Latency

    Describe the bug Flexgroup read latency is incorrect.

    Environment ONTAP 9.10.1P3 , NABOX 3.1 (install method: OVA image)

    To Reproduce Create a FlexGroup on Netapp Excute a workload (read/write operations) Open NABOX to watch latencies Open Dashboard: Harvest - cDOT / NetApp Detail: Volume - Details Open the row : Volume WAFL Layer Drilldown Look the panel: Read Latency Metric: topk($TopResources, volume_read_latency{datacenter="$Datacenter",cluster="$Cluster",svm=~"$SVM",volume=~"$Volume"})

    Expected behavior Latencies similar to those obtained with the ONTAP CLI command "qos statistics volume latency show", below 5ms.

    Actual behavior NABOX displays read latencies above 15ms.

    Possible solution, workaround, fix Looks like some metrics (volume_read_latency, volume_avg_latency) are incorrect for flexgroup volumes. and its constituents. It works fine for flexvols.

    Additional context Here is the output latencies from qos statistics volume latency show:

    Workload ID Latency Network Cluster Data Disk QoS Max QoS Min NVRAM Cloud FlexCache SM Sync VA AVSCAN


    -total- - 515.00us 65.00us 137.00us 302.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 512.00us 68.00us 132.00us 301.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 499.00us 61.00us 131.00us 296.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 521.00us 62.00us 132.00us 282.00us 34.00us 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 501.00us 63.00us 130.00us 296.00us 0ms 0ms 0ms 12.00us 0ms 0ms 0ms 0ms 0ms -total- - 474.00us 61.00us 131.00us 271.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 492.00us 64.00us 132.00us 285.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 506.00us 72.00us 134.00us 289.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 465.00us 60.00us 127.00us 267.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 491.00us 65.00us 129.00us 286.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 474.00us 61.00us 131.00us 271.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 498.00us 61.00us 133.00us 293.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms -total- - 514.00us 63.00us 132.00us 308.00us 0ms 0ms 0ms 11.00us 0ms 0ms 0ms 0ms 0ms

    Thanks

    status/done customer 
    opened by josepaulog 17
  • Client workoad statistics, monitoring

    Client workoad statistics, monitoring

    Describe the solution you'd like Capability to track workload patterns of NAS clients talking to a shared NAS volume to identify bully and victims in overloaded volumes Capability to show all the clients mounting a NFS volume i.e. (TCP connection, NFS session), in other words all NFS clients connected to backend NFS storage.

    feature priority/P2 status/done customer 
    opened by demalik 16
  • Harvest should document which metrics each dashboard uses

    Harvest should document which metrics each dashboard uses

    Thanks to Chris Waltham on Discord for raising

    Results for commit 716e111a

    bin/grafana metrics
    
    7mode/aggregate7.json
    - aggr_inode_files_total
    - aggr_inode_files_used
    - aggr_inode_inodefile_private_capacity
    - aggr_inode_inodefile_public_capacity
    - aggr_inode_used_percent
    - aggr_labels
    - aggr_new_status
    - aggr_raid_disk_count
    - aggr_snapshot_files_total
    - aggr_snapshot_files_used
    - aggr_snapshot_inode_used_percent
    - aggr_snapshot_maxfiles_available
    - aggr_snapshot_maxfiles_possible
    - aggr_snapshot_maxfiles_used
    - aggr_snapshot_size_available
    - aggr_snapshot_size_used
    - aggr_snapshot_used_percent
    - aggr_space_available
    - aggr_space_sis_saved
    - aggr_space_sis_saved_percent
    - aggr_space_total
    - aggr_space_used
    - aggr_space_used_percent
    - aggr_volume_count_flexvol
    - node_labels
    
    7mode/cluster7.json
    - aggr_disk_max_busy
    - aggr_space_total
    - aggr_space_used
    - aggr_space_used_percent
    - cluster_subsystem_new_status
    - cluster_subsystem_outstanding_alerts
    - cluster_subsystem_suppressed_alerts
    - disk_busy
    - node_cpu_busy
    - node_labels
    - node_new_status
    - volume_avg_latency
    - volume_read_data
    - volume_total_ops
    - volume_write_data
    
    7mode/disk7.json
    - aggr_disk_busy
    - aggr_disk_max_busy
    - aggr_disk_max_total_transfers
    - aggr_raid_disk_count
    - aggr_space_total
    - aggr_space_used_percent
    - disk_labels
    - disk_sectors
    - disk_uptime
    - flashcache_disk_reads_replaced
    - flashcache_hit_percent
    - flashpool_read_ops_replaced
    - flashpool_read_ops_replaced_percent
    - hostadapter_bytes_read
    - hostadapter_bytes_written
    - node_disk_data_read
    - node_disk_data_written
    - node_labels
    - wafl_cp_count
    
    7mode/lun7.json
    - lun_avg_read_latency
    - lun_avg_write_latency
    - lun_labels
    - lun_new_status
    - lun_read_align_histo
    - lun_read_data
    - lun_read_ops
    - lun_size
    - lun_size_used
    - lun_write_align_histo
    - lun_write_data
    - lun_write_ops
    - node_labels
    - volume_size_used_percent
    
    7mode/network7.json
    - fcp_avg_read_latency
    - fcp_avg_write_latency
    - fcp_read_data
    - fcp_read_ops
    - fcp_total_data
    - fcp_total_ops
    - fcp_util_percent
    - fcp_write_data
    - fcp_write_ops
    - nic_labels
    - nic_new_status
    - nic_rx_bytes
    - nic_rx_total_errors
    - nic_tx_bytes
    - nic_tx_total_errors
    - nic_util_percent
    - node_labels
    
    7mode/node7.json
    - aggr_new_status
    - disk_busy
    - fcp_util_percent
    - fcp_write_data
    - iscsi_lif_avg_latency
    - iscsi_lif_iscsi_other_ops
    - nic_tx_bytes
    - nic_util_percent
    - node_avg_processor_busy
    - node_cifs_latency
    - node_cifs_op_count
    - node_cifs_ops
    - node_cpu_busy
    - node_fcp_ops
    - node_iscsi_ops
    - node_labels
    - node_new_status
    - node_nfs_latency
    - node_nfs_ops
    - node_nfs_read_avg_latency
    - node_nfs_read_ops
    - node_nfs_total_ops
    - node_nfs_write_avg_latency
    - node_nfs_write_ops
    - node_uptime
    - volume_avg_latency
    - volume_other_latency
    - volume_other_ops
    - volume_read_data
    - volume_read_latency
    - volume_read_ops
    - volume_total_ops
    - volume_write_data
    - volume_write_latency
    - volume_write_ops
    - wafl_cp_phase_times
    - wafl_read_io_type
    
    7mode/shelf7.json
    - shelf_fan_rpm
    - shelf_labels
    - shelf_new_status
    - shelf_sensor_reading
    - shelf_temperature_reading
    - shelf_voltage_reading
    
    7mode/volume7.json
    - node_labels
    - volume_avg_latency
    - volume_labels
    - volume_new_status
    - volume_read_data
    - volume_read_latency
    - volume_read_ops
    - volume_size_total
    - volume_size_used
    - volume_size_used_percent
    - volume_total_ops
    - volume_write_data
    - volume_write_latency
    - volume_write_ops
    
    cmode/aggregate.json
    - aggr_disk_busy
    - aggr_inode_files_total
    - aggr_inode_files_used
    - aggr_inode_inodefile_private_capacity
    - aggr_inode_inodefile_public_capacity
    - aggr_inode_used_percent
    - aggr_labels
    - aggr_logical_used_wo_snapshots
    - aggr_logical_used_wo_snapshots_flexclones
    - aggr_new_status
    - aggr_physical_used_wo_snapshots
    - aggr_physical_used_wo_snapshots_flexclones
    - aggr_raid_disk_count
    - aggr_snapshot_files_total
    - aggr_snapshot_files_used
    - aggr_snapshot_inode_used_percent
    - aggr_snapshot_maxfiles_available
    - aggr_snapshot_maxfiles_possible
    - aggr_snapshot_maxfiles_used
    - aggr_snapshot_reserve_percent
    - aggr_snapshot_size_available
    - aggr_snapshot_size_used
    - aggr_snapshot_used_percent
    - aggr_space_available
    - aggr_space_capacity_tier_used
    - aggr_space_data_compaction_saved
    - aggr_space_data_compaction_saved_percent
    - aggr_space_physical_used
    - aggr_space_sis_saved
    - aggr_space_sis_saved_percent
    - aggr_space_total
    - aggr_space_used
    - aggr_space_used_percent
    - aggr_total_logical_used
    - aggr_total_physical_used
    - aggr_volume_count_flexvol
    - node_labels
    
    cmode/cdot.json
    - aggr_space_total
    - aggr_space_used
    - node_cifs_ops
    - node_cpu_busy
    - node_labels
    - node_nfs_ops
    - node_volume_avg_latency
    - svm_labels
    - svm_vol_avg_latency
    - svm_vol_read_data
    - svm_vol_total_ops
    - svm_vol_write_data
    - volume_avg_latency
    - volume_labels
    - volume_read_data
    - volume_size_total
    - volume_size_used
    - volume_total_ops
    - volume_write_data
    
    cmode/cluster.json
    - aggr_disk_busy
    - aggr_logical_used_wo_snapshots
    - aggr_logical_used_wo_snapshots_flexclones
    - aggr_physical_used_wo_snapshots
    - aggr_physical_used_wo_snapshots_flexclones
    - aggr_space_total
    - aggr_space_used
    - aggr_space_used_percent
    - aggr_total_logical_used
    - aggr_total_physical_used
    - cluster_new_status
    - cluster_subsystem_new_status
    - cluster_subsystem_outstanding_alerts
    - cluster_subsystem_suppressed_alerts
    - environment_sensor_average_ambient_temperature
    - environment_sensor_average_fan_speed
    - environment_sensor_average_temperature
    - environment_sensor_max_fan_speed
    - environment_sensor_max_temperature
    - environment_sensor_min_ambient_temperature
    - environment_sensor_min_fan_speed
    - environment_sensor_min_temperature
    - environment_sensor_power
    - node_avg_processor_busy
    - node_cpu_busy
    - node_disk_busy
    - node_disk_max_busy
    - node_labels
    - node_new_status
    - node_volume_avg_latency
    - node_volume_read_data
    - node_volume_read_latency
    - node_volume_total_ops
    - node_volume_write_data
    - node_volume_write_latency
    - svm_vol_avg_latency
    - svm_vol_read_data
    - svm_vol_total_ops
    - svm_vol_write_data
    - volume_avg_latency
    - volume_read_data
    - volume_total_ops
    - volume_write_data
    
    cmode/compliance.json
    - cluster_peer_labels
    - cluster_peer_non_encrypted
    - ntpserver_labels
    - security_account_activediruser
    - security_account_certificateuser
    - security_account_labels
    - security_account_ldapuser
    - security_account_localuser
    - security_account_samluser
    - security_audit_destination_status
    - security_certificate_labels
    - security_labels
    - security_login_labels
    - security_ssh_labels
    - support_labels
    - svm_labels
    - svm_ldap_encrypted
    - svm_ldap_signed
    
    cmode/data_protection_snapshot.json
    - snapshot_policy_total_schedules
    - volume_labels
    - volume_snapshot_count
    - volume_snapshot_reserve_size
    - volume_snapshots_size_used
    
    cmode/disk.json
    - aggr_disk_busy
    - aggr_disk_max_busy
    - aggr_disk_max_total_transfers
    - aggr_disk_max_user_read_chain
    - aggr_disk_max_user_write_chain
    - aggr_raid_disk_count
    - aggr_space_total
    - aggr_space_used_percent
    - disk_labels
    - disk_sectors
    - disk_stats_average_latency
    - disk_stats_io_kbps
    - disk_uptime
    - flashcache_disk_reads_replaced
    - flashcache_hit_percent
    - flashpool_read_ops_replaced
    - flashpool_read_ops_replaced_percent
    - hostadapter_bytes_read
    - hostadapter_bytes_written
    - node_disk_data_read
    - node_disk_data_written
    - node_labels
    - node_vol_write_latency
    - wafl_cp_count
    
    cmode/headroom.json
    - headroom_aggr_current_latency
    - headroom_aggr_current_ops
    - headroom_aggr_current_utilization
    - headroom_aggr_optimal_point_latency
    - headroom_aggr_optimal_point_ops
    - headroom_aggr_optimal_point_utilization
    - headroom_cpu_current_latency
    - headroom_cpu_current_ops
    - headroom_cpu_current_utilization
    - headroom_cpu_optimal_point_latency
    - headroom_cpu_optimal_point_ops
    - headroom_cpu_optimal_point_utilization
    - volume_labels
    
    cmode/lun.json
    - lun_avg_read_latency
    - lun_avg_write_latency
    - lun_caw_reqs
    - lun_labels
    - lun_new_status
    - lun_read_align_histo
    - lun_read_data
    - lun_read_ops
    - lun_remote_bytes
    - lun_remote_ops
    - lun_size
    - lun_size_used
    - lun_unmap_reqs
    - lun_write_align_histo
    - lun_write_data
    - lun_write_ops
    - lun_writesame_reqs
    - lun_writesame_unmap_reqs
    - lun_xcopy_reqs
    - node_labels
    - qos_detail_volume_resource_latency
    - volume_sis_compress_saved_percent
    - volume_sis_dedup_saved_percent
    - volume_size_used_percent
    - volume_snapshot_reserve_used_percent
    
    cmode/mcc_cluster.json
    - aggr_disk_max_busy
    - aggr_new_status
    - fcvi_rdma_write_avg_latency
    - fcvi_rdma_write_ops
    - fcvi_rdma_write_throughput
    - hostadapter_bytes_read
    - hostadapter_bytes_written
    - node_avg_processor_busy
    - node_labels
    - path_read_data
    - path_read_iops
    - path_read_latency
    - path_write_data
    - path_write_iops
    - path_write_latency
    - plex_disk_busy
    - plex_disk_user_read_latency
    - plex_disk_user_reads
    - plex_disk_user_write_latency
    - plex_disk_user_writes
    - volume_avg_latency
    
    cmode/metadata.json
    - metadata_collector_api_time
    - metadata_collector_calc_time
    - metadata_collector_metrics
    - metadata_collector_parse_time
    - metadata_collector_poll_time
    - metadata_component_count
    - metadata_component_status
    - metadata_exporter_count
    - metadata_exporter_time
    - metadata_target_goroutines
    - metadata_target_ping
    - metadata_target_status
    - poller_cpu
    - poller_cpu_percent
    - poller_fds
    - poller_io
    - poller_memory
    - poller_memory_percent
    - poller_net
    - poller_status
    - poller_threads
    
    cmode/network.json
    - fcp_avg_read_latency
    - fcp_avg_write_latency
    - fcp_discarded_frames_count
    - fcp_int_count
    - fcp_invalid_crc
    - fcp_invalid_transmission_word
    - fcp_isr_count
    - fcp_link_down
    - fcp_link_failure
    - fcp_loss_of_signal
    - fcp_loss_of_sync
    - fcp_nvmf_avg_read_latency
    - fcp_nvmf_avg_write_latency
    - fcp_nvmf_read_data
    - fcp_nvmf_read_ops
    - fcp_nvmf_total_data
    - fcp_nvmf_total_ops
    - fcp_nvmf_write_data
    - fcp_nvmf_write_ops
    - fcp_prim_seq_err
    - fcp_queue_full
    - fcp_read_data
    - fcp_spurious_int_count
    - fcp_threshold_full
    - fcp_total_data
    - fcp_util_percent
    - fcp_write_data
    - nic_labels
    - nic_new_status
    - nic_rx_alignment_errors
    - nic_rx_bytes
    - nic_rx_crc_errors
    - nic_rx_length_errors
    - nic_rx_total_errors
    - nic_tx_bytes
    - nic_tx_hw_errors
    - nic_tx_total_errors
    - nic_util_percent
    - node_labels
    
    cmode/nfs4storePool.json
    - nfs_diag_storePool_ByteLockAlloc
    - nfs_diag_storePool_ByteLockMax
    - nfs_diag_storePool_ClientAlloc
    - nfs_diag_storePool_ClientMax
    - nfs_diag_storePool_ConnectionParentSessionReferenceAlloc
    - nfs_diag_storePool_ConnectionParentSessionReferenceMax
    - nfs_diag_storePool_CopyStateAlloc
    - nfs_diag_storePool_CopyStateMax
    - nfs_diag_storePool_DelegAlloc
    - nfs_diag_storePool_DelegMax
    - nfs_diag_storePool_DelegStateAlloc
    - nfs_diag_storePool_DelegStateMax
    - nfs_diag_storePool_LayoutAlloc
    - nfs_diag_storePool_LayoutMax
    - nfs_diag_storePool_LayoutStateAlloc
    - nfs_diag_storePool_LayoutStateMax
    - nfs_diag_storePool_LockAlloc
    - nfs_diag_storePool_LockMax
    - nfs_diag_storePool_LockStateAlloc
    - nfs_diag_storePool_LockStateMax
    - nfs_diag_storePool_OpenAlloc
    - nfs_diag_storePool_OpenMax
    - nfs_diag_storePool_OpenStateAlloc
    - nfs_diag_storePool_OpenStateMax
    - nfs_diag_storePool_OwnerAlloc
    - nfs_diag_storePool_OwnerMax
    - nfs_diag_storePool_SessionAlloc
    - nfs_diag_storePool_SessionConnectionHolderAlloc
    - nfs_diag_storePool_SessionConnectionHolderMax
    - nfs_diag_storePool_SessionHolderAlloc
    - nfs_diag_storePool_SessionHolderMax
    - nfs_diag_storePool_SessionMax
    - nfs_diag_storePool_StateRefHistoryAlloc
    - nfs_diag_storePool_StateRefHistoryMax
    - nfs_diag_storePool_StringAlloc
    - nfs_diag_storePool_StringMax
    - node_labels
    
    cmode/nfs_clients.json
    - nfs_clients_idle_duration
    - volume_labels
    
    cmode/node.json
    - aggr_new_status
    - fcp_lif_avg_latency
    - fcp_lif_total_ops
    - fcp_lif_write_data
    - fcp_nvmf_read_data
    - fcp_nvmf_write_data
    - fcp_read_data
    - fcp_util_percent
    - fcp_write_data
    - iscsi_lif_avg_latency
    - iscsi_lif_iscsi_other_ops
    - iscsi_lif_write_data
    - nic_rx_bytes
    - nic_tx_bytes
    - nic_util_percent
    - node_avg_processor_busy
    - node_cifs_connections
    - node_cifs_established_sessions
    - node_cifs_latency
    - node_cifs_op_count
    - node_cifs_open_files
    - node_cifs_ops
    - node_cifs_signed_sessions
    - node_cpu_busy
    - node_cpu_domain_busy
    - node_disk_busy
    - node_disk_max_busy
    - node_failed_fan
    - node_failed_power
    - node_fcp_ops
    - node_iscsi_ops
    - node_labels
    - node_new_status
    - node_nfs_latency
    - node_nfs_ops
    - node_nfs_read_avg_latency
    - node_nfs_read_ops
    - node_nfs_read_throughput
    - node_nfs_throughput
    - node_nfs_total_ops
    - node_nfs_write_avg_latency
    - node_nfs_write_ops
    - node_nfs_write_throughput
    - node_nvmf_ops
    - node_uptime
    - nvme_lif_avg_latency
    - nvme_lif_total_ops
    - nvme_lif_write_data
    - volume_avg_latency
    - volume_other_latency
    - volume_other_ops
    - volume_read_data
    - volume_read_latency
    - volume_read_ops
    - volume_total_ops
    - volume_write_data
    - volume_write_latency
    - volume_write_ops
    - wafl_cp_phase_times
    - wafl_read_io_type
    
    cmode/power.json
    - environment_sensor_average_ambient_temperature
    - environment_sensor_average_fan_speed
    - environment_sensor_average_temperature
    - environment_sensor_max_fan_speed
    - environment_sensor_max_temperature
    - environment_sensor_min_ambient_temperature
    - environment_sensor_min_fan_speed
    - environment_sensor_min_temperature
    - environment_sensor_power
    - node_labels
    - shelf_average_ambient_temperature
    - shelf_average_fan_speed
    - shelf_average_temperature
    - shelf_disk_count
    - shelf_labels
    - shelf_max_fan_speed
    - shelf_max_temperature
    - shelf_min_ambient_temperature
    - shelf_min_fan_speed
    - shelf_min_temperature
    - shelf_new_status
    - shelf_power
    
    cmode/qtree.json
    - qtree_cifs_ops
    - qtree_internal_ops
    - qtree_labels
    - qtree_nfs_ops
    - qtree_total_ops
    - quota_disk_used
    - quota_files_used
    - volume_labels
    
    cmode/quotaReport.json
    - quota_disk_limit
    - quota_disk_used
    - quota_disk_used_pct_disk_limit
    - quota_file_limit
    - quota_files_used
    - quota_files_used_pct_file_limit
    - quota_soft_disk_limit
    - quota_soft_file_limit
    - volume_labels
    
    cmode/security.json
    - security_account_activediruser
    - security_account_certificateuser
    - security_account_ldapuser
    - security_account_localuser
    - security_account_samluser
    - security_certificate_labels
    - svm_labels
    - volume_labels
    
    cmode/shelf.json
    - shelf_average_ambient_temperature
    - shelf_average_fan_speed
    - shelf_average_temperature
    - shelf_disk_count
    - shelf_fan_rpm
    - shelf_labels
    - shelf_max_fan_speed
    - shelf_max_temperature
    - shelf_min_ambient_temperature
    - shelf_min_fan_speed
    - shelf_min_temperature
    - shelf_new_status
    - shelf_power
    - shelf_psu_power_drawn
    - shelf_psu_power_rating
    - shelf_sensor_reading
    - shelf_temperature_reading
    - shelf_voltage_reading
    
    cmode/snapmirror.json
    - snapmirror_break_failed_count
    - snapmirror_break_successful_count
    - snapmirror_labels
    - snapmirror_lag_time
    - snapmirror_last_transfer_duration
    - snapmirror_last_transfer_size
    - snapmirror_resync_failed_count
    - snapmirror_resync_successful_count
    - snapmirror_update_failed_count
    - snapmirror_update_successful_count
    - volume_labels
    
    cmode/svm.json
    - copy_manager_kb_copied
    - fcp_lif_avg_latency
    - fcp_lif_avg_other_latency
    - fcp_lif_avg_read_latency
    - fcp_lif_avg_write_latency
    - fcp_lif_other_ops
    - fcp_lif_read_data
    - fcp_lif_read_ops
    - fcp_lif_total_ops
    - fcp_lif_write_data
    - fcp_lif_write_ops
    - iscsi_lif_avg_latency
    - iscsi_lif_avg_other_latency
    - iscsi_lif_avg_read_latency
    - iscsi_lif_avg_write_latency
    - iscsi_lif_iscsi_other_ops
    - iscsi_lif_iscsi_read_ops
    - iscsi_lif_iscsi_write_ops
    - iscsi_lif_read_data
    - iscsi_lif_write_data
    - lif_recv_data
    - lif_sent_data
    - nvme_lif_avg_latency
    - nvme_lif_avg_other_latency
    - nvme_lif_avg_read_latency
    - nvme_lif_avg_write_latency
    - nvme_lif_other_ops
    - nvme_lif_read_data
    - nvme_lif_read_ops
    - nvme_lif_total_ops
    - nvme_lif_write_data
    - nvme_lif_write_ops
    - qos_detail_resource_latency
    - qos_latency
    - qos_ops
    - qos_read_data
    - qos_read_latency
    - qos_read_ops
    - qos_sequential_reads
    - qos_sequential_writes
    - qos_write_data
    - qos_write_latency
    - qos_write_ops
    - svm_cifs_connections
    - svm_cifs_latency
    - svm_cifs_op_count
    - svm_cifs_open_files
    - svm_cifs_read_latency
    - svm_cifs_read_ops
    - svm_cifs_write_latency
    - svm_cifs_write_ops
    - svm_nfs_latency
    - svm_nfs_ops
    - svm_nfs_read_avg_latency
    - svm_nfs_read_ops
    - svm_nfs_read_throughput
    - svm_nfs_read_total
    - svm_nfs_throughput
    - svm_nfs_write_avg_latency
    - svm_nfs_write_ops
    - svm_nfs_write_throughput
    - svm_nfs_write_total
    - svm_read_total
    - svm_vol_avg_latency
    - svm_vol_other_latency
    - svm_vol_other_ops
    - svm_vol_read_data
    - svm_vol_read_latency
    - svm_vol_read_ops
    - svm_vol_total_ops
    - svm_vol_write_data
    - svm_vol_write_latency
    - svm_vol_write_ops
    - svm_vscan_connections_active
    - svm_vscan_dispatch_latency
    - svm_vscan_scan_latency
    - svm_vscan_scan_noti_received_rate
    - svm_vscan_scan_request_dispatched_rate
    - svm_write_total
    - volume_labels
    - volume_read_data
    - volume_read_latency
    - volume_read_ops
    - volume_sis_compress_saved
    - volume_sis_compress_saved_percent
    - volume_sis_dedup_saved
    - volume_sis_dedup_saved_percent
    - volume_sis_total_saved
    - volume_size_used_percent
    - volume_snapshot_reserve_used_percent
    - volume_write_data
    - volume_write_latency
    - volume_write_ops
    
    cmode/volume.json
    - fabricpool_cloud_bin_op_latency_average
    - fabricpool_cloud_bin_operation
    - qos_detail_volume_resource_latency
    - qos_volume_read_data
    - qos_volume_read_latency
    - qos_volume_read_ops
    - qos_volume_sequential_reads
    - qos_volume_sequential_writes
    - qos_volume_write_data
    - qos_volume_write_latency
    - qos_volume_write_ops
    - volume_avg_latency
    - volume_labels
    - volume_new_status
    - volume_read_data
    - volume_read_latency
    - volume_read_ops
    - volume_sis_compress_saved
    - volume_sis_dedup_saved
    - volume_size
    - volume_size_total
    - volume_size_used
    - volume_size_used_percent
    - volume_snapshot_reserve_available
    - volume_snapshot_reserve_percent
    - volume_snapshot_reserve_size
    - volume_snapshot_reserve_used_percent
    - volume_snapshots_size_available
    - volume_snapshots_size_used
    - volume_space_logical_used
    - volume_space_logical_used_percent
    - volume_space_physical_used
    - volume_space_physical_used_percent
    - volume_total_ops
    - volume_write_data
    - volume_write_latency
    - volume_write_ops
    
    storagegrid/tenant.json
    - bucket_bytes
    - bucket_objects
    - tenant_labels
    - tenant_logical_quota
    - tenant_logical_used
    - tenant_objects
    - tenant_used_percent
    
    feature status/open customer 23.02 
    opened by cgrinds 0
  • feat: added dashboard tests for legends details

    feat: added dashboard tests for legends details

    This PR covers these:

    • calcs: should be minimum of {mean, lastNotNull, max}, exception: when sum exist, other calculation may not be exist
    • displayMode: should be table, exception: hidden
    • placement: bottom

    Case1 for calculation missing:

        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Destination Relationships per Node, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Destination - Break Operations, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Destination - Resync Operations, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Destination - Update Operations, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Source Relationships per Node, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Source - Break Operations, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Source - Resync Operations, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Source - Update Operations, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Source Relationships per SVM, calculation section(s) mean not found
        dashboard_test.go:704: dashboard=cmode/snapmirror.json, panel=Destination Relationships per SVM, calculation section(s) mean not found
    
    

    Case2 and 3 for DisplayMode and Placement:

        dashboard_test.go:696: dashboard=cmode/snapmirror.json, panel=Volume relationship count by relationship health, legend placement want=bottom got=right val {
                      "displayMode": "hidden",
                      "placement": "right",
                      "values": [
                        "value"
                      ]
                    }
    
    
    cla-signed 
    opened by Hardikl 1
  • FSA integration

    FSA integration

    It would be nice to see some File System Analytics integration in Harvest, especially metrics on directory growth / change and perhaps least / most accessed data.

    feature customer 
    opened by aticatac 1
Releases(nightly)
  • nightly(Dec 1, 2022)

  • v22.11.0(Nov 21, 2022)

    22.11.0 / 2022-11-21

    :pushpin: Highlights of this major release include:

    • :sparkles: Harvest now includes a StorageGRID collector and a Tenant/Buckets dashboard. We're just getting started with StorageGRID dashboards. Please give the collector a try, and let us know which StorageGRID dashboards you'd like to see next.

    • :tophat: The REST collectors are ready! We recommend using them for ONTAP versions 9.12.1 and higher. Today, Harvest collects 1,546 metrics via ZAPI. Harvest includes a full set of REST templates that export identical metrics. All 1,546 metrics are available via Harvest's REST templates and no changes to dashboards or downstream metric-consumers is required. :tada: More details on Harvest's REST strategy.

    • :closed_book: Harvest has a new documentation site! This consolidates Harvest documentation into one place and will make it easier to find what you need. Stay tuned for more updates here.

    • :gem: New and improved dashboards

      • cDOT, high-level cluster overview dashboard
      • Headroom dashboard
      • Quota dashboard
      • Snapmirror dashboard shows source and destination side of relationship
      • NFS clients dashboard
      • Fabric Pool panels are now included in Volume dashboard
      • Tags are included for all default dashboards, making it easier to find what you need
      • Additional throughput, ops, and utilization panels were added to the Aggregate, Disk, and Clusters dashboards
      • Harvest dashboards updated to enable multi-select variables, shared crosshairs, better top n resources support, and all variables are sorted by default.
    • :lock: Harvest code is checked for vulnerabilities on every commit using Go's vulnerability management scanner.

    • Harvest collects additional metrics in this release

      • ONTAP S3 server config metrics
      • User defined volume workload
      • Active network connections
      • NFS connected clients
      • Network ports
      • Netstat packet loss
    • Harvest now converts ONTAP histograms to Prometheus histograms, making it possible to visualize metrics as heatmaps in Grafana

    Announcements

    :bangbang: IMPORTANT NetApp moved their communities from Slack to Discord, please join us there!

    :bomb: Deprecation: Earlier versions of Harvest published quota metrics prefixed with qtree. Harvest release 22.11 deprecates the quota metrics prefixed with qtree and instead publishes quota metrics prefixed with quota. All dashboards have been updated. If you are consuming these metrics outside the default dashboards, please change to quota prefixed metrics. Harvest release 23.02 will remove the deprecated quota metrics prefixed with qtree.

    :bangbang: IMPORTANT If using Docker Compose and you want to keep your historical Prometheus data, please read how to migrate your Prometheus volume

    :bulb: IMPORTANT After upgrade, don't forget to re-import your dashboards, so you get all the new enhancements and fixes. You can import them via the bin/harvest/grafana import CLI, from the Grafana UI, or from the Maintenance > Reset Harvest Dashboards button in NAbox.

    Known Issues

    • Harvest does not calculate power metrics for AFF A250 systems. This data is not available from ONTAP via ZAPI or REST. See ONTAP bug 1511476 for more details.

    • ONTAP does not include REST metrics for offbox_vscan_server and offbox_vscan until ONTAP 9.13.1. See ONTAP bug 1473892 for more details.

    • Podman is unable to pull from NetApp's container registry cr.netapp.io. Until issue is resolved, Podman users can pull from a separate proxy like this podman pull netappdownloads.jfrog.io/oss-docker-harvest-production/harvest:latest.

    • 7-mode filers that are not on the latest release of ONTAP may experience TLS connection issues with errors like tls: server selected unsupported protocol version 301 This is caused by a change in Go 1.18. The default for TLS client connections was changed to TLS 1.2 in Go 1.18. Please upgrade your 7-mode filers (recommended) or set tls_min_version: tls10 in your harvest.yml poller section. See #1007 for more details.

    • The Unix collector is unable to monitor pollers running in containers. See #249 for details.

    Thanks to all the awesome contributors

    :metal: Thanks to all the people who've opened issues, asked questions on Discord, and contributed code or dashboards this release:

    @Falcon667, @MrObvious, @ReneMeier, @Sawall10, @T1r0l, @chadpruden, @demalik, @electrocreative, @ev1963, @faguayot, @iStep2Step, @jgasher, @jmg011, @mamoep, @matthieu-sudo, @merdos, @rodenj1, Ed Wilts, KlausHub, MeghanaD, Paul P2, Rusty Brown, Shubham Mer, jf38800, rcl23, troysmuller

    :seedling: This release includes 59 features, 90 bug fixes, 21 documentation, 4 testing, 2 styling, 6 refactoring, 2 miscellaneous, and 6 ci commits.

    :rocket: Features

    • Enable Multi Select By Default (#1213)
    • Merge Release 22.08 To Main (#1218)
    • Add Avg Cifs Latency To Svm Dashboard Graph Panel (#1221)
    • Network Port Templates (#1231)
    • Add Node Cpu Busy To Cluster Dashboard (#1243)
    • Improve Poller Startup Logging (#1254)
    • Add Net Connections Template For Rest Collector (#1257)
    • Upgrade Zapi Collector To Rest When The Ontap Version Is >= 9.12.1 (#1261)
    • Run Govulncheck On Make Dev (#1273)
    • Nfsv42 Restperf Templates (#1275)
    • Enable User Defined Volume Workload (#1276)
    • Prometheus Exporter Should Log Address And Port (#1279)
    • Ensure Dashboard Units Align With Ontap's Units (#1280)
    • Panels Should Connect Null Values (#1281)
    • Harvest Should Collect Ontap S3 Server Metrics (#1285)
    • Bin/Zapi Show Counters Should Print Xml Results To Make Parsi… (#1286)
    • Harvest Should Collect Ontap S3 Server Config Metrics (#1287)
    • Harvest Should Publish Cooked Zero Performance Metrics (#1292)
    • Add Grafana Tags On Default Dashboards (#1293)
    • Add Harvest Tags (#1294)
    • Rest Nfs Connections Dashboard (#1297)
    • Cmd Line Objects And Collectors Override Defaults (#1300)
    • Harvest Should Replace Topk With Topk Range In All Dashboards Part 1 (#1301)
    • Harvest Should Replace Topk With Topk Range In All Dashboards Part 2 (#1302)
    • Harvest Should Replace Topk With Topk Range In All Dashboards Part 3 (#1304)
    • Snapmirror From Source Side [Zapi Changes] (#1307)
    • Mcc Plex Panel Fix (#1310)
    • Add Support For Qos Min And Cp In Harvest (#1316)
    • Add Available Ops To Headroom Dashboard (#1317)
    • Added Panels In Cluster, Disk For 1.6 Parity (#1320)
    • Add Storagegrid Collector And Dashboard (#1322)
    • Export Ontap Histograms As Prometheus Histograms (#1326)
    • Solution Based Cdot Dashboard (#1336)
    • Cluster Var Changed To Source_cluster In Snapmirror Dashboard (#1337)
    • Remove Pollinstance From Zapi Collector (#1338)
    • Reduce Memory Footprint Of Set (#1339)
    • Quota Metric Renaming (#1345)
    • Collectors Should Log Polldata, Plugin Times, And Metadata (#1347)
    • Export Ontap Histograms As Prometheus Histograms (#1349)
    • Fabricpool Panels - Parity With 1.6 (#1352)
    • All Dashboards Should Default To Shared Crosshair (#1359)
    • All Dashboards Should Use Multi-Select Dropdowns For Each Variable (#1363)
    • Perf Collector Unit Test Cases (#1373)
    • Remove Metric Labels From Shelf Sensor Plugins (#1378)
    • Envvar Harvest_no_upgrade To Skip Collector Upgrade (#1385)
    • Rest Collector Should Not Log When Client_timeout Is Missing (#1387)
    • Enable Rest Collector Templates (#1391)
    • Harvest Should Use Rest Unconditionally Starting With 9.13.1 (#1394)
    • Rest Perf Template Fixes (#1395)
    • Only Allow One Config/Perf Collector Per Object (#1410)
    • Histogram Support For Restperf (#1412)
    • Volume Tag Plugin (#1417)
    • Add Rest Validation In Ci (#1421)
    • Add Netstat Metrics For Packet Loss (#1423)
    • Add Datacenter To Metadata (#1427)
    • Increase Dashboard Quality With More Tests (#1460)
    • Add Node_disk_data_read To Units.yaml (#1480)
    • Tag Fsx Dashboards (#1490)
    • Restperf Submetric (#1506)

    :bug: Bug Fixes

    • Log.fatalln Will Exit, And Defer Resp.body.close() Will Not Run (#1211)
    • Remove Rewrite_as_label From Templates (#1212)
    • Set User To Uid If Name Is Missing (#1223)
    • Duplicate Instance Issue Quota (#1225)
    • Create Unique Indexes For Quota Dashboard (#1226)
    • Volume Dashboard Should Use Iec Bytes (#1229)
    • Skipped Bookend Ems Whose Key Is Node-Name (#1237)
    • Bin/Rest Should Support Verbose And Return Error When (#1240)
    • Volume Plugin Should Not Fail When Snapmirror Has Empty Relationship_id (#1241)
    • Volume.go Plugin Should Check No Instances (#1253)
    • Remove Power 24H Panel From Shelf Dashboard (#1256)
    • Govulncheck Scan Issue Go-2021-0113 (#1259)
    • Negative Counter Fix And Zero Suppression (#1260)
    • Remove User_id To Reduce Memory Load From Quota (#1263)
    • Snapmirror Relationships From Source Side (#1266)
    • Flashcache Dashboard Units Are Incorrect (#1268)
    • Disable User,Group Quota By Default (#1271)
    • Enable Dashboard Check In Ci (#1277)
    • Http Sd Should Publish Local Ip When Exporter Is Blank (#1278)
    • Headroom Dashboard Utilization Should Be In Percentage (#1290)
    • Simple Poller Should Use Int64 Metric (#1291)
    • Remove Label Warning From Rest Collector (#1299)
    • Ignore Negative Perf Deltas (#1303)
    • Mcc Plex Panel Fix Rest Template (#1313)
    • Remove Duplicate Network Dashboards (#1314)
    • 7Mode Zapi Cli Issue Due To Max (#1321)
    • Add Scale To Headroom Dashboard (#1323)
    • Increase Default Zapi Timeout To 30 Seconds (#1333)
    • Zapiperf Lun Name Should Match Zapi (#1341)
    • Record Number Of Zapi Instances In Polldata (#1343)
    • Rest Metric Count (#1346)
    • Aggregator.go Should Not Change Histogram Properties To Avg (#1348)
    • Ci Ems Issue (#1350)
    • Add Node In Warning Logs For Power Calculation (#1351)
    • Align Aggregate Disk Utilization Panel (#1355)
    • Correct Skip Count For Perf Percent Property (#1358)
    • Harvest Should Keep Same Volume Name During Upgrade In Docker-Compose Workflow (#1361)
    • Zapi Polldata Logged The Wrong Number Of Instances During Batch … (#1366)
    • Top Latency Units Should Be Microseconds Not Milliseconds (#1371)
    • Calculate Power From Voltage And Current Sensors When Power Units Are Not Known (#1372)
    • Don't Add Units As Metric Labels Since It Breaks Influxdb Exporter (#1376)
    • Handle Raidgroup/Plex Alongwith Other Changes (#1380)
    • Disable Color Console Logging (#1382)
    • Restperf Lun Name Should Match Zapi (#1390)
    • Cluster Dashboard Panel Changes (#1393)
    • Harvest Should Use Template Display Name When Exporting Histograms (#1403)
    • Rest Collector Should Collect Cluster Level Instances (#1404)
    • Remove Protected,Protectionrole,All_healthy Labels From Volume (#1406)
    • Snapmirror Dashboard Changes (#1407)
    • System Node Perf Template Fix (#1409)
    • Svm Records Count (#1411)
    • Dont Export Constituents Relationships In Sm (#1414)
    • Handle Ls Relationships + Handle Dashboard (#1415)
    • Snapmirror Dashboard Should Not Show Id Column (#1416)
    • Handle Error When No Instances Found For Plugins In Rest (#1428)
    • Handle Batching In Shelf Plugin (#1429)
    • Lun Rest Perf Template Fixes (#1430)
    • Handle Volume Panels (#1431)
    • Align Rest Start Up Logging As Zapi (#1435)
    • Handle Aggr_space_used_percent In Aggr (#1439)
    • Sensor Plugin Rest Changes (#1440)
    • Disk Dashboard - Variables Are Not Sorted (#1443)
    • Add Missing Labels For Rest Zapi Diff (#1445)
    • Shelf Child Obj - Status Ok To Normal (#1451)
    • Restperf Key Handling (#1452)
    • Rest Zapi Diff Error Handling (#1453)
    • Restperf Fcp Template Mapping Fix (#1455)
    • Disk Type In Lower Case In Rest (#1456)
    • Power Fix For Cold Sensors (#1464)
    • Svm With Private Cli (#1465)
    • Storagegrid Collector Should Use Metadata Collection (#1468)
    • Fix New Line Char In Headroom Dashboard (#1473)
    • New_status Gap Issue For Cluster Scoped Zapi Call (#1477)
    • Merge To Main From Release (#1479)
    • Tenant Column Should Be In "Tenants And Buckets" Table Once (#1483)
    • Fsx Headroom Dashboard Support For Rest Collector (#1484)
    • Qos Rest Template Fix (#1487)
    • Net Port Template Fix (#1488)
    • Disable Netport Rest Template (#1491)
    • Rest Sensor Template Fix (#1492)
    • Fix Background Color In Cluster, Aggregate Panels (#1496)
    • Smv_labels Missing In Zapi (#1499)
    • 7Mode Shelf Plugin Fix (#1500)
    • Remove Queue_full Counter From Namespace Template (#1501)
    • Storagegrid Bucket Plugin Should Honor Client Timeout (#1503)
    • Snapmirror Warn To Trace Log (#1504)
    • Cdot Svm Panels (#1515)
    • Svm Nfsv4 Panels Fix (#1518)
    • Svm Copy Panel Fix (#1520)
    • Exclude Empty Qtree In Restperf Template Through Regex (#1522)

    :closed_book: Documentation

    • Rest Strategy Doc (#1234)
    • Improve Security Panel Info For Ontap 9.10+ (#1238)
    • Explain How To Enable Qos Collection When Using Least-Privilege… (#1249)
    • Clarify When Harvest Defaults To Rest (#1252)
    • Spelling Correction (#1318)
    • Add Ems Alert To Ems Documentation (#1319)
    • Explain How To Log To File With Systemd Instantiated Service (#1325)
    • Add Help Text In Nfs Clients Dashboard About Enabling Rest Collector (#1334)
    • Describe How To Migrate Historical Prometheus Data Generated Be… (#1369)
    • Explain What To Do If Zapi Metrics Are Missing In Rest (#1389)
    • Move Documentation To Separate Site (#1433)
    • Readme Should Point To Https://Netapp.github.io/Harvest/ (#1434)
    • Remove Unneeded Readme.md Files (#1438)
    • Fix Image Links (#1441)
    • Restore Rest Strategy And Migrate Docs (#1463)
    • Explain What Negative Available Ops Means (#1469)
    • Explain What Negative Available Ops Means (#1469) (#1471)
    • Explain What Negative Available Ops Means (#1474)
    • Add Amazon Fsx For Ontap Documentation (#1485)
    • Include Docker Compose Workflow In Docs (#1507)
    • Docker Upgrade Instructions (#1511)

    :wrench: Testing

    • Add Unit Test That Finds Metrics Used In Dashboards With Confli… (#1381)
    • Ensure Resetinstance Causes Metric To Be Skipped (#1388)
    • Validate Sensor Template Fix (#1486)
    • Add Unit Test To Detect Topk Without Range (#1495)

    Styling

    • Address Shellcheck Strong Warnings (#1228)
    • Correct Spelling And Lint Warning (#1332)

    Refactoring

    • Use Map Instead Of Loop For Targetisontap (#1235)
    • Remove Unused And Lightly-Used Metrics (#1274)
    • Remove Warnings (#1298)
    • Simplify Loadcollector (#1329)
    • Simplify Set Api (#1340)
    • Move Docs Out Of The Way To Make Way For New Ones (#1422)

    Miscellaneous

    • Bump Dependencies (#1331)
    • Increase Golangci-Lint Timeout (#1364)

    :hammer: CI

    • Bump Go To 1.19 (#1201)
    • Bump Go (#1215)
    • Bump Golangci-Lint And Address Issues (#1233)
    • Bump Go To 1.19.1 (#1262)
    • Revert Jenkins File To 1.19 (#1267)
    • Bump Golangci-Lint (#1328)
    Source code(tar.gz)
    Source code(zip)
    harvest-22.11.0-1.amd64.deb(62.80 MB)
    harvest-22.11.0-1.x86_64.rpm(52.23 MB)
    harvest-22.11.0-1_linux_amd64.tar.gz(68.49 MB)
  • v22.08.0(Aug 19, 2022)

    22.08.0 / 2022-08-19

    :rocket: Highlights of this major release include:

    • :sparkler: an ONTAP event management system (EMS) events collector with 64 events out-of-the-box

    • Two new dashboards added in this release

      • Headroom dashboard
      • Quota dashboard
    • We've made lots of improvements to the REST Perf collector. The REST Perf collector should be considered early-access as we continue to improve it. This feature requires ONTAP versions 9.11.1 and higher.

    • New max plugin that creates new metrics from the maximum of existing metrics by label.

    • New compute_metric plugin that creates new metrics by combining existing metrics with mathematical operations.

    • 48 feature, 45 bug fixes, and 11 documentation commits this release

    IMPORTANT :bangbang: NetApp is moving their communities from Slack to NetApp's Discord with a plan to lock the Slack channel at the end of August. Please join us on Discord!

    IMPORTANT :bangbang: Prometheus version 2.26 or higher is required for the EMS Collector.

    IMPORTANT :bangbang: After upgrade, don't forget to re-import your dashboards, so you get all the new enhancements and fixes. You can import them via the bin/harvest/grafana import CLI or from the Grafana UI.

    Known Issues

    Podman is unable to pull from NetApp's container registry cr.netapp.io. Until issue is resolved, Podman users can pull from a separate proxy like this podman pull netappdownloads.jfrog.io/oss-docker-harvest-production/harvest:latest.

    IMPORTANT 7-mode filers that are not on the latest release of ONTAP may experience TLS connection issues with errors like tls: server selected unsupported protocol version 301 This is caused by a change in Go 1.18. The default for TLS client connections was changed to TLS 1.2 in Go 1.18. Please upgrade your 7-mode filers (recommended) or set tls_min_version: tls10 in your harvest.yml poller section. See #1007 for more details.

    The Unix collector is unable to monitor pollers running in containers. See #249 for details.

    Enhancements

    • :sparkler: Harvest adds an ONTAP event management system (EMS) events collector in this release. It collects ONTAP events, exports them to Prometheus, and provides integration with Prometheus AlertManager. Full list of 64 events

    • New Harvest Headroom dashboard. #1039 Thanks to @faguayot for reporting.

    • New Quota dashboard. #1111 Thanks to @ev1963 for raising this feature request.

    • We've made lots of improvements to the REST Perf collector and filled several gaps in this release. #881

    • Harvest Power dashboard should include Min Ambient Temp and Min Temp. Thanks to Papadopoulos Anastasios for reporting.

    • Harvest Disk dashboard should include the Back-to-back CP Count and Write Latency metrics. #1040 Thanks to @faguayot for reporting.

    • Rest templates should be disabled by default until ONTAP removes ZAPI support. That way, Harvest does not double collect and store metrics.

    • Harvest dashboards name prefix should be ONTAP: instead of NetApp Detail:. #1080. Thanks to Martin Möbius for reporting.

    • Harvest Qtree dashboard should show Total Qtree IOPs and Internal IOPs panels and Qtree filter. #1079 Thanks to @mamoep for reporting.

    • Harvest Cluster dashboard should show SVM Performance panel. #1117 Thanks to @Falcon667 for reporting.

    • Combine SnapMirror and Data Protection dashboards. #1082. Thanks to Martin Möbius for reporting.

    • vscan performance object should be enabled by default. #1182 Thanks to Gabriel Conne for reporting on Slack.

    • Lun and Volume dashboard should use topk range. #1184 Thanks to Papadopoulos Anastasios for reporting on Slack. These changes make these dashboards more consistent with Harvest 1.6.

    • New MetricAgent plugin. It is used to manipulate metrics based on a set of rules.

    • New Max plugin. It creates a new collection of metrics by calculating max of metric values from an existing matrix for a given label.

    • bin/zapi should support querying multiple performance counters. #1167

    • Harvest REST private CLI should include filter support

    • Harvest should support request/response logging in Rest/RestPerf Collector.

    • Harvest maximum log file size is reduced from 10mb to 5mb. The maximum number of log files are reduced from 10 to 5.

    • Harvest should consolidate log messages and reduce noise.

    Fixes

    • Missing Ambient Temperature for AFF900 in Power Dashboard. #1173 Thanks to @iStep2Step for reporting.

    • Flexgroup latency should match the values reported by ONTAP CLI. #1060 Thanks to @josepaulog for reporting.

    • Perf Zapi Volume label should match Zapi Volume label. The label type was changed to style for Perf ZAPI Volume. #1055 Thanks to Papadopoulos Anastasios for reporting.

    • Zapi:SecurityCert should handle certificates per SVM instead of reporting duplicate instance key errors. #1075 Thanks to @mamoep for reporting.

    • Zapi:SecurityAccount should handle per switch SNMP users instead of reporting duplicate instance key errors. #1088 Thanks to @mamoep for reporting.

    • Wrong throughput units in Disk dashboard. #1091 Thanks to @Falcon667 for reporting.

    • Qtree Dashboard shows no data when SVM/Volume are selected from dropdown. #1099 Thanks to Papadopoulos Anastasios for reporting.

    • Virus Scan connections Active panel in SVM dashboard shows decimal places in Y axis. #1101 Thanks to Rene Meier for reporting.

    • Add Disk Utilization per Aggregate description in Disk Dashboard. #1193 Thanks to @faguayot for reporting.

    • Prometheus exporter should escape label_value. #1128 Thanks to @vavdoshka for reporting.

    • Grafana import dashboard fails if anonymous access is enabled. @1132 Thanks @iStep2Step for reporting.

    • Improve color consistency and hover information on Compliance/Data Protection dashboards. #1083 Thanks to Rene Meier for reporting.

    • Compliance & Security Dashboards the text is unreadable with Grafana light theme. #1078 Thanks to @mamoep for reporting.

    • InfluxDB exporter should not require bucket, org, port, or precision fields when using url. #1155 Thanks to li fi for reporting.

    • Node CPU Busy and Disk Utilization should match the same metrics reported by ONTAP sysstat -m CLI. #1152 Thanks to Papadopoulos Anastasios for reporting.

    • Harvest should detect counter overflow and report it as 0. [#762] Thanks to @rodenj1 for reporting.

    • Zerolog console logger fails to log stack traces. #1044

    Source code(tar.gz)
    Source code(zip)
    harvest-22.08.0-1.amd64.deb(60.14 MB)
    harvest-22.08.0-1.x86_64.rpm(49.59 MB)
    harvest-22.08.0-1_linux_amd64.tar.gz(64.93 MB)
  • v22.05.0(May 11, 2022)

    Releases

    22.05.0 / 2022-05-11

    :rocket: Highlights of this major release include:

    • Early access to ONTAP REST perf collector from ONTAP 9.11.1GA+

    • :hourglass: New Container Registry - Several of you have mentioned that you are being rate-limited when pulling Harvest Docker images from DockerHub. To alleviate this problem, we're publishing Harvest images to NetApp's container registry (cr.netapp.io). Going forward, we'll publish images to both DockerHub and cr.netapp.io. More information in the FAQ. No action is required unless you want to switch from DockerHub to cr.netapp.io. If so, the FAQ has you covered.

    • Five new dashboards added in this release

      • Power dashboard
      • Compliance dashboard
      • Security dashboard
      • Qtree dashboard
      • NFSv4 Store Pool dashboard (disabled by default)
    • New value_to_num_regex plugin allows you to map all matching expressions to 1 and non-matching ones to 0.

    • Harvest pollers can optionally read credentials from a mounted volume or file. This enables Hashicorp Vault support and works especially well with Vault agent

    • bin/grafana import provides a --multi flag that rewrites dashboards to include multi-select dropdowns for each variable at the top of the dashboard

    • The conf/rest collector templates received a lot of attentions this release. All known gaps between the ZAPI and REST collector have been filled and there is full parity between the two from ONTAP 9.11+. :metal:

    • 24 bug fixes, 48 feature, and 5 documentation commits this release

    IMPORTANT :bangbang: After upgrade, don't forget to re-import your dashboards so you get all the new enhancements and fixes. You can import via bin/harvest/grafana import cli or from the Grafana UI.

    IMPORTANT The conf/zapiperf/cdot/9.8.0/object_store_client_op.yaml ZapiPerf template is being deprecated in this release and will be removed in the next release of Harvest. No dashboards use the counters defined in this template and all counters are being deprecated by ONTAP. If you are using these counters, please create your own copy of the template.

    Known Issues

    IMPORTANT 7-mode filers that are not on the latest release of ONTAP may experience TLS connection issues with errors like tls: server selected unsupported protocol version 301 This is caused by a change in Go 1.18. The default for TLS client connections was changed to TLS 1.2 in Go 1.18. Please upgrade your 7-mode filers (recommended) or set tls_min_version: tls10 in your harvest.yml poller section. See #1007 for more details.

    The Unix collector is unable to monitor pollers running in containers. See #249 for details.

    Enhancements

    • Harvest should include a Power dashboard that shows power consumed, temperatures and fan speeds at a node and shelf level #932 and #903

    • Harvest should include a Security dashboard that shows authentication methods and certificate expiration details for clusters, volume encryption and status of anti-ransomware for volumes and SVMs #935

    • Harvest should include a Compliance dashboard that shows compliance status of clusters and SVMs along with individual compliance attributes #935

    • SVM dashboard should show antivirus counters in the CIFS drill-down section #913 Thanks to @burkl for reporting

    • Cluster and Aggregate dashboards should show Storage Efficiency Ratio metrics #888 Thanks to @Falcon667 for reporting

    • :construction: This is another step in the ZAPI to REST road map. In earlier releases, we focused on config ZAPIs and in this release we've added early access to an ONTAP REST perf collector. :confetti_ball: The REST perf collector and thirty-nine templates included in this release, require ONTAP 9.11.1GA+ :astonished: These should be considered early access as we continue to improve them. If you try them out or have any feedback, let us know on Slack or GitHub #881

    • Harvest should collect NFS v4.2 counters which are new in ONTAP 9.11+ releases #572

    • Plugin logging should include object detail #986

    • Harvest dashboards should use Time series panels instead of Graph (old) panels #972. Thanks to @ybizeul for raising

    • New regex based plugin value_to_num_regex helps map labels to numeric values for Grafana dashboards.

    • Harvest status should run on systems without pgrep #937 Thanks to Dan Butler for reporting this on Slack

    • When using a credentials file and the poller is not found, also consult the defaults section of the harvest.yml file #936

    • Harvest should include an NFSv4 StorePool dashboard that shows NFSv4 store pool locks and allocation detail #921 Thanks to Rusty Brown for contributing this dashboard.

    • REST collector should report cpu-busytime for node #918 Thanks to @pilot7777 for reporting this on Slack

    • Harvest should include a Qtree dashboard that shows Qtree NFS/CIFS metrics #812 Thanks to @ev1963 for reporting

    • Harvest should support reading credentials from an external file or mounted volume #905

    • Grafana dashboards should have checkbox to show multiple objects in variable drop-down. See comment for details. #815 #939 Thanks to @manuelbock, @bcase303 for reporting

    • Harvest should include Prometheus port (promport) to metadata metric #878

    • Harvest should use NetApp's container registry for Docker images #874

    • Increase ZAPI client timeout for default and volume object #1005

    • REST collector should support retrieving a subset of objects via template filtering support #950

    • Harvest should support minimum TLS version config #1007 Thanks to @jmg011 for reporting and verifying this

    Fixes

    • SVM Latency numbers differ significantly on Harvest 1.6 vs Harvest 2.0 #1003 See discussion as well. Thanks to @jmg011 for reporting

    • Harvest should include regex patterns to ignore transient volumes related to backup #929. Not enabled by default, see conf/zapi/cdot/9.8.0/volume.yaml for details. Thanks to @ybizeul for reporting

    • Exclude OS aggregates from capacity used graph #327 Thanks to @matejzero for raising

    • Few panels need to have instant property in Data protection dashboard #945

    • CPU overload when there are several thousands of quotas #733 Thanks to @Flo-Fly for reporting

    • Include 7-mode CLI role commands for Harvest user #891 Thanks to @ybizeul for reporting and providing the changes!

    • Zapi Collector fails to collect data if number of records on a poller is equal to batch size #870 Thanks to @unbreakabl3 on Slack for reporting

    • Wrong object name used in conf/zapi/cdot/9.8.0/snapshot.yaml #862 Thanks to @pilot7777 for reporting

    • Field access-time returned by snapshot-get-iter should be creation-time #861 Thanks to @pilot7777 for reporting

    • Harvest panics when trying to merge empty template #859 Thanks to @pilot7777 for raising

    Source code(tar.gz)
    Source code(zip)
    harvest-22.05.0-1.amd64.deb(58.35 MB)
    harvest-22.05.0-1.x86_64.rpm(48.38 MB)
    harvest-22.05.0-1_linux_amd64.tar.gz(63.01 MB)
  • v.22.02.0(Feb 15, 2022)

    Releases

    22.02.0 / 2022-02-15

    :boom: Highlights of this major release include:

    • Continued progress on the ONTAP REST config collector. Most of the template changes are in place and we're working on closing the gaps between ZAPI and REST. We've made lots of improvements to the REST collector and included 13 REST templates in this release. The REST collector should be considered early-access as we continue to improve it. If you try it out or have any feedback, let us know on Slack or GitHub. :book: You can find more information about when you should switch from ZAPI to REST, what versions of ONTAP are supported by Harvest's REST collector, and how to fill ONTAP gaps between REST and ZAPI documented here

    • Many of you asked for nightly builds. We have them. :confetti_ball: We're also working on publishing to multiple Docker registries since you've told us you're running into rate-limiting problems with DockerHub. We'll announce here and Slack when we have a solution in place.

    • Two new Data Protection dashboards

    • bin/grafana cli should not overwrite dashboard changes, making it simpler to import/export dashboards, and enabling round-tripping dashboards (import, export, re-import)

    • New include_contains plugin allows you to select a subset of objects. e.g. selecting only volumes with custom-defined ONTAP metadata

    • We've included more out-of-the-box Prometheus alerts. Keep sharing your most useful alerts!

    • 7mode workflows continue to be improved :heart: Harvest now collects Qtree and Quotas counters from 7mode filers (these are already collected in cDOT)

    • 28 bug fixes, 52 feature, and 11 documentation commits this release

    IMPORTANT Admin node certificate file location changed. Certificate files have been consolidated into the cert directory. If you created self-signed admin certs, you need to move the admin-cert.pem and admin-key.pem files into the cert directory.

    IMPORTANT In earlier versions of Harvest, the Qtree template exported the vserver metric. This counter was changed to svm to be consistent with other templates. If you are using the qtree vserver metric, you will need to update your queries to use svm instead.

    IMPORTANT :bangbang: After upgrade, don't forget to re-import your dashboards so you get all the new enhancements and fixes. You can import via bin/harvest/grafana import cli or from the Grafana UI.

    IMPORTANT The LabelAgent value_mapping plugin was deprecated in the 21.11 release and removed in 22.02. Use LabelAgent value_to_num instead. See docs for details.

    Known Issues

    The Unix collector is unable to monitor pollers running in containers. See #249 for details.

    Enhancements

    • Harvest should include a Data Protection dashboard that shows volumes protected by snapshots, which ones have exceeded their reserve copy, and which are unprotected #664

    • Harvest should include a Data Protection SnapMirror dashboard that shows which volumes are protected, how they're protected, their protection relationship, along with their health and lag durations.

    • Harvest should provide nightly builds to GitHub and DockerHub #713

    • Harvest bin/grafana cli should not overwrite dashboard changes, making it simpler to import/export dashboards, and enabling round-tripping dashboards (import, export, re-import) #831 Thanks to @luddite516 for reporting and @florianmulatz for iterating with us on a solution

    • Harvest should provide a include_contains label agent plugin for filtering #735 Thanks to @chadpruden for reporting

    • Improve Harvest's container compatibility with K8s via kompose. #655 See also and discussion

    • The ZAPI cli tool should include counter types when querying ZAPIs #663

    • Harvest should include a richer set of Prometheus alerts #254 Thanks @demalik for raising

    • Template plugins should run in the order they are defined and compose better. The output of one plugin can be fed into the input of the next one. #736 Thanks to @chadpruden for raising

    • Harvest should collect Antivirus counters when ONTAP offbox vscan is configured #346 Thanks to @burkl and @Falcon667 for reporting

    • Document how to run Harvest with containerd and Rancher

    • Qtree counters should be collected for 7-mode filers #766 Thanks to @jmg011 for raising this issue and iterating with us on a solution

    • Harvest admin node should work with pollers running in Docker compose #678

    • Document how to run Harvest with Podman. Several RHEL customers asked about this since Podman ships as the default container runtime on RHEL8+.

    • Harvest should include a Systemd service file for the HTTP service discovery admin node #656

    • Document how ZAPI collectors, templates, and exporting work together. Thanks @jmg011 and others for asking for this

    • Remove redundant dashboards (Network, Node, SVM, Volume) #703 Thanks to @mamoep for reporting this

    • Harvest generate docker command should support customer-supplied Prometheus and Grafana ports. #584

    • Harvest certificate authentication should work with self-signed subject alternative name (SAN) certificates. Improve documentation on how to use certificate authentication. Thanks to @edd1619 for raising this issue

    • Harvest's Prometheus exporter should optionally sort labels. Without sorting, VictoriaMetrics marks metrics stale. #756 Thanks to @mamoep for reporting and verifying

    • Harvest should optionally log to a file when running in the foreground. Handy for instantiated instances running on OSes that have poor support for jounalctl #813 and #810 Thanks to @mamoep for reporting and verifying this works in a nightly build

    • Harvest should collect workload concurrency #714

    • Harvest certificate directory should be included in a container's volume mounts #725

    • MetroCluster dashboard should show path object metrics #746

    • Harvest should collect namespace resources from ONTAP #749

    • Harvest should be more resilient to cluster connectivity issues #480

    • Harvest Grafana dashboard version string should match the Harvest release #631

    • REST collector improvements

      • Harvest REST collector should support ONTAP private cli endpoints #766

      • REST collector should support ZAPI-like object prefixing #786

      • REST collector should support computing new customer-defined metrics #780

      • REST collector should collect aggregate, qtree and quota counters #780

      • REST collector metrics should be reported in autosupport #841

      • REST collector should collect sensor counters #789

      • Collect network port interface information not available via ZAPI #691 Thanks to @pilot7777, @mamoep amd @wagneradrian92 for working on this with us

      • Publish REST collector document that highlights when you should switch from ZAPI to REST, what versions of ONTAP are supported by Harvest's REST collectors and how to fill ONTAP gaps between REST and ZAPI

      • REST collector should support Qutoa, Shelf, Snapmirror, and Volume plugins #799 and #811

    • Improve troubleshooting and documentation on validating certificates from macOS #723

    • Harvest should read its config information from the environment variable HARVEST_CONFIG when supplied. This env var has higher precedence than the --config command-line argument. #645

    Fixes

    • FlexGroup statistics should be aggregated across node and aggregates #706 Thanks to @wally007 for reporting

    • Network Details dashboard should use correct units and support variable sorting #673 Thanks to @mamoep for reporting and reviewing the fix

    • Harvest Systemd service should wait for network to start #707 Thanks to @mamoep for reporting and fixing

    • MetroCluster dashboard should use correct units and support variable sorting #685 Thanks to @mamoep and @chris4789 for reporting this

    • 7mode shelf plugin should handle cases where multiple channels have the same shelf id #692 Thanks to @pilot7777 for reporting this on Slack

    • Improve template YAML parsing when indentation varies #704 Thanks to @mamoep for reporting this.

    • Harvest should not include version information in its container name. #660. Thanks to @wally007 for raising this.

    • Ignore missing Qtrees and improve uniqueness check on 7mode filers #782 and #797. Thanks to @jmg011 for reporting

    • Qtree instance key order should be unified between 7mode and cDOT #807 Thanks to @jmg011 for reporting

    • Workload detail volume collection should not try to create duplicate counters #803 Thanks to @luddite516 for reporting

    • Harvest HTTP service discovery node should not attempt to publish Prometheus metrics to InfluxDB #684

    • Grafana import should save auth token to the config file referenced by HARVEST_CONFIG when that environnement variable exists #681

    • bin/zapi should print output #715

    • Snapmirror dashboard should show correct number of SVM-DR relationships, last transfer, and health status #728 Thanks to Gaël Cantarero on Slack for reporting

    • Ensure that properties defined in object templates override their parent properties #765

    • Increase time that metrics are retained in Prometheus exporter from 3 minutes to 5 minutes #778

    • Remove the misplaced SVM FCP Throughput panel from the iSCSI drilldown section of the SVM details dashboard #821 Thanks to @florianmulatz for reporting and fixing

    • When importing Grafana dashboards, remove the existing id and uid so Grafana treats the import as a create instead of an overwrite #825 Thanks to @luddite516 for reporting

    • Relax the Grafana version check constraint so version 8.4.0-beta1 is considered >=7.1 #828 Thanks to @ybizeul for reporting

    • bin/harvest status should report running for pollers exporting to InfluxDB, instead of reporting that they are not running #835

    • Pin the Grafana and Prometheus versions in the Docker compose workflow instead of pulling latest #822

    Source code(tar.gz)
    Source code(zip)
    harvest-22.02.0-4.amd64.deb(56.12 MB)
    harvest-22.02.0-4.x86_64.rpm(47.15 MB)
    harvest-22.02.0-4_linux_amd64.tar.gz(60.56 MB)
  • v21.11.1(Dec 10, 2021)

    Change Log

    Releases

    21.11.1 / 2021-12-10

    This release is the same as 21.11.0 with an FSx dashboard fix for #737. If you are not monitoring an FSx system the 21.11.0 release is the same, no need to upgrade. We reverted a node-labels check in those dashboards because Harvest does not collect node data from FSx systems.

    Highlights of this major release include:

    • Early access to ONTAP REST collector
    • Support for Prometheus HTTP service discovery
    • New MetroCluster dashboard
    • Qtree and Quotas collection
    • We heard your ask, and we made it happen. We've separated cDOT and 7mode dashboards so each can evolve independently
    • Label sets allow you to add additional key-value pairs to a poller's metrics #538 and expose those labels in your dashboards
    • Template merging was improved to keep your template changes separate from Harvest's
    • Harvest poller's are more deterministic about picking free ports
    • 37 bug fixes

    IMPORTANT The LabelAgent value_mapping plugin is being deprecated in this release and will be removed in the next release of Harvest. Use LabelAgent value_to_num instead. See docs for details.

    IMPORTANT After upgrade, don't forget to re-import all dashboards so you get new dashboard enhancements and fixes. You can re-import via bin/harvest/grafana cli or from the Grafana UI.

    IMPORTANT RPM and Debian packages will be deprecated in the future, replaced with Docker and native binaries. See #330 for details and tell us what you think. Several of you have already weighed-in. Thanks! If you haven't, please do.

    Known Issues

    The Unix collector is unable to monitor pollers running in containers. See #249 for details.

    Enhancements

    • :construction: ONTAP started moving their APIs from ZAPI to REST in ONTAP 9.6. Harvest adds an early access ONTAP REST collector in this release (config only). :confetti_ball: This is our first step among several as we prepare for the day that ZAPIs are turned off. The REST collector and seven templates are included in 21.11. These should be considered early access as we continue to improve them. If you try them out or have any feedback, let us know on Slack or GitHub. #402

    • Harvest should have a Prometheus HTTP service discovery end-point to make it easier to add/remove pollers #575

    • Harvest should include a MetroCluster dashboard #539 Thanks @darthVikes for reporting

    • Harvest should collect Qtree and Quota metrics #522 Thanks @jmg011 for reporting and validating this works in your environment

    • SVM dashboard: Make NFS version a variable. SVM variable should allow selecting all SVMs for a cluster wide view #454

    • Harvest should monitor ONTAP chassis sensors #384 Thanks to @hashi825 for raising this issue and reviewing the pull request

    • Harvest cluster dashboard should include All option in dropdown for clusters #630 thanks @TopSpeed for raising this on Slack

    • Harvest should collect volume sis status #519 Thanks to @jmg011 for raising

    • Separate cDOT and 7-mode dashboards allowing each to change independently #489 #501 #547

    • Improve collector and object template merging and documentation #493 #555 Thanks @hashi825 for reviewing and suggesting improvements

    • Harvest should support label sets, allowing you to add additional key-value pairs to a poller's metrics#538

    • bin/grafana import should create a matching label and rewrite queries to use chained variable when using label sets #550

    • Harvest poller's should reuse their previous Prometheus port when restarted and be more deterministic about picking free ports #596 #595 Thanks to @cordelster for reporting

    • Improve instantiated systemd template by specifying user/group, requires, and moving Unix pollers to the end of the list. #643 Thanks to @mamoep for reporting and providing the changes! :sparkles:

    • Harvest's Docker container should use local conf directory instead of copying into image. Makes upgrade and changing template files easier. #511

    • Improve Disk dashboard by showing total number of disks by node and aggregate #583

    • Harvest 7-mode dashboards should be provisioned when using Docker Compose workflow #544

    • When upgrading, bin/harvest grafana import should add dashboards to a release-named folder so earlier dashboards are not overwritten #616

    • client_timeout should be overridable in object template files #563

    • Increase ZAPI client timeouts for volume and workloads objects #617

    • Doctor: Ensure that all pollers export to unique Prometheus ports #597

    • Improve execution performance of Harvest management commands :rocket: bin/harvest start|stop|restart #600

    • Include eight cDOT dashboards that use InfluxDB datasource #466. Harvest does not support InfluxDB dashboards for 7-mode. Thanks to @SamyukthaM for working on these

    • Docs: Describe how Harvest converts template labels into Prometheus labels #585

    • Docs: Improve Matrix documentation to better align with code #485

    • Docs: Improve ARCHITECTURE.md #603

    Fixes

    • Poller should report metadata when running on BusyBox #529 Thanks to @charz for reporting issue and providing details

    • Space used % calculation was incorrect for Cluster and Aggregate dashboards #624 Thanks to @faguayot and @jorbour for reporting.

    • When ONTAP indicates a counter is deprecated, but a replacement is not provided, continue using the deprecated counter #498

    • Harvest dashboard panels must specify a Prometheus datasource to correctly handles cases were a non-Prometheus default datasource is defined in Grafana. #639 Thanks for reporting @MrObvious

    • Prometheus datasource was missing on five dashboards (Network and Disk) #566 Thanks to @survive-wustl for reporting

    • Document permissions that Harvest requires to monitor ONTAP with a read-only user #559 Thanks to @survive-wustl for reporting and working with us to chase this down. :thumbsup:

    • Metadata dashboard should show correct status for running/stopped pollers #567 Thanks to @cordelster for reporting

    • Harvest should serve a human-friendly :corn: overview page of metric types when hitting the Prometheus end-point #613 Thanks @cordelster for reporting

    • SnapMirror plugin should include source_node #608

    • Disk dashboard should use better labels in table details #578

    • SVM dashboard should show correct units and remove duplicate graph #454

    • FCP plugin should work with 7-mode clusters #464

    • Node values are missing from some 7-mode perf counters #467

    • Nic state is missing from several network related dashboards 486

    • Reduce log noise when templates are not found since this is often expected #606

    • Use diagnosis-config-get-iter to collect node status from 7-mode systems #499

    • Node status is missing from 7-mode #527

    • Improve 7-mode templates. Remove cluster from 7-mode. yamllint all templates #531

    • When saving Grafana auth token, make sure bin/grafana writes valid Yaml #544

    • Improve Yaml parsing when different levels of indention are used in harvest.yml. You should see fewer invalid indentation messages. :clap: #626

    • Unix poller should ignore /proc files that aren't readable #249

    Source code(tar.gz)
    Source code(zip)
    harvest-21.11.1-1.amd64.deb(54.40 MB)
    harvest-21.11.1-1.x86_64.rpm(45.42 MB)
    harvest-21.11.1-1_linux_amd64.tar.gz(58.75 MB)
  • v21.11.0(Nov 8, 2021)

    Change Log

    Releases

    21.11.0 / 2021-11-08

    Highlights of this major release include:

    • Early access to ONTAP REST collector
    • Support for Prometheus HTTP service discovery
    • New MetroCluster dashboard
    • Qtree and Quotas collection
    • We heard your ask, and we made it happen. We've separated cDOT and 7mode dashboards so each can evolve independently
    • Label sets allow you to add additional key-value pairs to a poller's metrics #538 and expose those labels in your dashboards
    • Template merging was improved to keep your template changes separate from Harvest's
    • Harvest poller's are more deterministic about picking free ports
    • 37 bug fixes

    IMPORTANT The LabelAgent value_mapping plugin is being deprecated in this release and will be removed in the next release of Harvest. Use LabelAgent value_to_num instead. See docs for details.

    IMPORTANT After upgrade, don't forget to re-import all dashboards so you get new dashboard enhancements and fixes. You can re-import via bin/harvest/grafana cli or from the Grafana UI.

    IMPORTANT RPM and Debian packages will be deprecated in the future, replaced with Docker and native binaries. See #330 for details and tell us what you think. Several of you have already weighed-in. Thanks! If you haven't, please do.

    Known Issues

    The Unix collector is unable to monitor pollers running in containers. See #249 for details.

    Enhancements

    • :construction: ONTAP started moving their APIs from ZAPI to REST in ONTAP 9.6. Harvest adds an early access ONTAP REST collector in this release (config only). :confetti_ball: This is our first step among several as we prepare for the day that ZAPIs are turned off. The REST collector and seven templates are included in 21.11. These should be considered early access as we continue to improve them. If you try them out or have any feedback, let us know on Slack or GitHub. #402

    • Harvest should have a Prometheus HTTP service discovery end-point to make it easier to add/remove pollers #575

    • Harvest should include a MetroCluster dashboard #539 Thanks @darthVikes for reporting

    • Harvest should collect Qtree and Quota metrics #522 Thanks @jmg011 for reporting and validating this works in your environment

    • SVM dashboard: Make NFS version a variable. SVM variable should allow selecting all SVMs for a cluster wide view #454

    • Harvest should monitor ONTAP chassis sensors #384 Thanks to @hashi825 for raising this issue and reviewing the pull request

    • Harvest cluster dashboard should include All option in dropdown for clusters #630 thanks @TopSpeed for raising this on Slack

    • Harvest should collect volume sis status #519 Thanks to @jmg011 for raising

    • Separate cDOT and 7-mode dashboards allowing each to change independently #489 #501 #547

    • Improve collector and object template merging and documentation #493 #555 Thanks @hashi825 for reviewing and suggesting improvements

    • Harvest should support label sets, allowing you to add additional key-value pairs to a poller's metrics#538

    • bin/grafana import should create a matching label and rewrite queries to use chained variable when using label sets #550

    • Harvest poller's should reuse their previous Prometheus port when restarted and be more deterministic about picking free ports #596 #595 Thanks to @cordelster for reporting

    • Improve instantiated systemd template by specifying user/group, requires, and moving Unix pollers to the end of the list. #643 Thanks to @mamoep for reporting and providing the changes! :sparkles:

    • Harvest's Docker container should use local conf directory instead of copying into image. Makes upgrade and changing template files easier. #511

    • Improve Disk dashboard by showing total number of disks by node and aggregate #583

    • Harvest 7-mode dashboards should be provisioned when using Docker Compose workflow #544

    • When upgrading, bin/harvest grafana import should add dashboards to a release-named folder so earlier dashboards are not overwritten #616

    • client_timeout should be overridable in object template files #563

    • Increase ZAPI client timeouts for volume and workloads objects #617

    • Doctor: Ensure that all pollers export to unique Prometheus ports #597

    • Improve execution performance of Harvest management commands :rocket: bin/harvest start|stop|restart #600

    • Include eight cDOT dashboards that use InfluxDB datasource #466. Harvest does not support InfluxDB dashboards for 7-mode. Thanks to @SamyukthaM for working on these

    • Docs: Describe how Harvest converts template labels into Prometheus labels #585

    • Docs: Improve Matrix documentation to better align with code #485

    • Docs: Improve ARCHITECTURE.md #603

    Fixes

    • Poller should report metadata when running on BusyBox #529 Thanks to @charz for reporting issue and providing details

    • Space used % calculation was incorrect for Cluster and Aggregate dashboards #624 Thanks to @faguayot and @jorbour for reporting.

    • When ONTAP indicates a counter is deprecated, but a replacement is not provided, continue using the deprecated counter #498

    • Harvest dashboard panels must specify a Prometheus datasource to correctly handles cases were a non-Prometheus default datasource is defined in Grafana. #639 Thanks for reporting @MrObvious

    • Prometheus datasource was missing on five dashboards (Network and Disk) #566 Thanks to @survive-wustl for reporting

    • Document permissions that Harvest requires to monitor ONTAP with a read-only user #559 Thanks to @survive-wustl for reporting and working with us to chase this down. :thumbsup:

    • Metadata dashboard should show correct status for running/stopped pollers #567 Thanks to @cordelster for reporting

    • Harvest should serve a human-friendly :corn: overview page of metric types when hitting the Prometheus end-point #613 Thanks @cordelster for reporting

    • SnapMirror plugin should include source_node #608

    • Disk dashboard should use better labels in table details #578

    • SVM dashboard should show correct units and remove duplicate graph #454

    • FCP plugin should work with 7-mode clusters #464

    • Node values are missing from some 7-mode perf counters #467

    • Nic state is missing from several network related dashboards 486

    • Reduce log noise when templates are not found since this is often expected #606

    • Use diagnosis-config-get-iter to collect node status from 7-mode systems #499

    • Node status is missing from 7-mode #527

    • Improve 7-mode templates. Remove cluster from 7-mode. yamllint all templates #531

    • When saving Grafana auth token, make sure bin/grafana writes valid Yaml #544

    • Improve Yaml parsing when different levels of indention are used in harvest.yml. You should see fewer invalid indentation messages. :clap: #626

    • Unix poller should ignore /proc files that aren't readable #249

    Source code(tar.gz)
    Source code(zip)
    harvest-21.11.0-1.amd64.deb(54.41 MB)
    harvest-21.11.0-1.x86_64.rpm(45.43 MB)
    harvest-21.11.0-1_linux_amd64.tar.gz(58.77 MB)
  • v21.08.0(Aug 31, 2021)

    Change Log

    Releases

    21.08.0 / 2021-08-31

    This major release introduces a Docker workflow that makes it a breeze to standup Grafana, Prometheus, and Harvest with auto-provisioned dashboards. There are seven new dashboards, example Prometheus alerts, and a bunch of fixes detailed below. We haven't forgotten about our 7-mode customers either and have a number of improvements in 7-mode dashboards with more to come.

    This release Harvest also sports the most external contributions to date. :metal: Thanks!

    With 284 commits since 21.05, there is a lot to summarize! Make sure you check out the full list of enhancements and improvements in the CHANGELOG.md since 21.05.

    IMPORTANT Harvest relies on the autosupport sidecar binary to periodically send usage and support telemetry data to NetApp by default. Usage of the harvest-autosupport binary falls under the NetApp EULA. Automatic sending of this data can be disabled with the autosupport_disabled option. See Autosupport for details.

    IMPORTANT RPM and Debian packages will be deprecated in the future, replaced with Docker and native binaries. See #330 for details and tell us what you think. Several of you have already weighed-in. Thanks! If you haven't, please do.

    IMPORTANT After upgrade don't forget to re-import all dashboards so you get new dashboard enhancements and fixes. You can re-import via bin/harvest/grafana cli or from the Grafana UI.

    Known Issues

    We've improved several of the 7-mode dashboards this release, but there are still a number of gaps with 7-mode dashboards when compared to c-mode. We will address these in a point release by splitting the c-mode and 7-mode dashboards. See #423 for details.

    On RHEL and Debian, the example Unix collector does not work at the moment due to the harvest user lacking permissions to read the /proc filesystem. See #249 for details.

    Enhancements

    • Make it easy to install Grafana, Prometheus, and Harvest with Docker Compose and auto-provisioned dashboards. #349

    • Lun, Volume Details, Node Details, Network Details, and SVM dashboards added to Harvest. Thanks to @jgasher for contributing five solid dashboards. :tada: #458 #482

    • Disk dashboard added to Harvest with disk type, status, uptime, and aggregate information. Thanks to @faguayot, @bengoldenberg, and @talshechanovitz for helping with this feature #348 #375 #367 #361

    • New SVM dashboard with NFS v3, v4, and v4.1 frontend drill-downs. Thanks to @burkl for contributing these. :tada: #344

    • Harvest templates should be extendible without modifying the originals. Thanks to @madhusudhanarya and @albinpopote for reporting. #394 #396 #391

    • Sort all variables in Harvest dashboards in ascending order. Thanks to @florianmulatz for raising #350

    • Harvest should include example Prometheus alerting rules #414

    • Improved documentation on how to send new ZAPIs and modify existing ZAPI templates. Thanks to @albinpopote for reporting. #397

    • Improve Harvest ZAPI template selection when monitoring a broader set of ONTAP clusters including 7-mode and 9.10.X #407

    • Collectors should log their full ZAPI request/response(s) when their poller includes a log section #382

    • Harvest should load config information from the HARVEST_CONF environment variable when set. Thanks to @ybizeul for reporting. #368

    • Document how to delete time series data from Prometheus #393

    • Harvest ZAPI tool supports printing results in XML and colors. This makes it easier to post-process responses in downstream pipelines #353

    • Harvest version should check for a new release and display it when available #323

    • Document how client authentication works and how to troubleshoot #325

    Fixes

    • ZAPI collector should recover after losing connection with ONTAP cluster for several hours. Thanks to @hashi825 for reporting this and helping us track it down #356

    • ZAPI templates with the same object name overwrite matrix data (impacted nfs and object_store_client_op templates). Thanks to @hashi825 for reporting this #462

    • Lots of fixes for 7-mode dashboards and data collection. Thanks to @madhusudhanarya and @ybizeul for reporting. There's still more work to do for 7-mode, but we understand some of our customers rely on Harvest to help them monitor these legacy systems. #383 #441 #423 #415 #376

    • Aggregate dashboard "space used column" should use current fill grade. Thanks to @florianmulatz for reporting. #351

    • When building RPMs don't compile Harvest Python test code. Thanks to @madhusudhanarya for reporting. #385

    • Harvest should collect include NVMe and fiber channel port counters. Thanks to @jgasher for submitting these. #363

    • Harvest should export NFS v4 metrics. It does for v3 and v4.1, but did not for v4 due to a typo in the v4 ZAPI template. Thanks to @jgasher for reporting. #481

    • Harvest panics when port_range is used in the Prometheus exporter and address is missing. Thanks to @ybizeul for reporting. #357

    • Network dashboard fiber channel ports (FCP) should report read and write throughput #445

    • Aggregate dashboard panel titles should match the information being displayed #133

    • Harvest should handle ZAPIs that include signed integers. Most ZAPIs use unsigned integers, but a few return signed ones. Thanks for reporting @hashi825 #384

    Source code(tar.gz)
    Source code(zip)
    harvest-21.08.0-6.amd64.deb(47.94 MB)
    harvest-21.08.0-6.x86_64.rpm(39.31 MB)
    harvest-21.08.0-6_linux_amd64.tar.gz(51.99 MB)
  • v21.05.4(Jul 22, 2021)

    Change Log

    Releases

    21.05.4 / 2021-07-22

    This release introduces Qtree protocol collection, improved Docker and client authentication documentation, publishing to Docker Hub, and a new plugin that helps build richer dashboards, as well as a couple of important fixes for collector panics.

    IMPORTANT RPM and Debian packages will be deprecated in the future, replaced with Docker and native binaries. See #330 for details and tell us what you think.

    Known Issues

    On RHEL and Debian, the example Unix collector does not work at the moment due to the harvest user lacking permissions to read the /proc filesystem. See #249 for details.

    Enhancements

    • Harvest collects Qtree protocol ops #298. Thanks to Martin Möbius for contributing

    • Harvest Grafana tool (optionally) adds a user-specified prefix to all Dashboard metrics during import. See harvest grafana --help #87

    • Harvest is taking its first steps to talk REST: query ONTAP, show Swagger API, model, and definitions #292

    • Tagged releases of Harvest are published to Docker Hub

    • Harvest honors Go's http(s) environment variable proxy information. See https://pkg.go.dev/net/http#ProxyFromEnvironment for details #252

    • New plugin value_to_num helps map labels to numeric values for Grafana dashboards. Current dashboards updated to use this plugin #319

    • harvest.yml supports YAML flow style. E.g. collectors: [Zapi] #260

    • New Simple collector that runs on Macos and Unix #270

    • Improve client certificate authentication documentation

    • Improve Docker deployment documentation 4019308

    Fixes

    • Harvest collector should not panic when resources are deleted from ONTAP #174 and #302. Thanks to @hashi825 and @mamoep for providing steps to reproduce

    • Shelf metrics should report on op-status for components. Thanks to @hashi825 for working with us on this fix and dashboard improvements #262

    • Harvest should not panic when InfluxDB is the only exporter #286

    • Volume dashboard space-used column should display with percentage filled. Thanks to @florianmulatz for reporting and suggesting a fix #303

    • Certificate authentication should honor path in harvest.yml #318

    • Harvest should not kill processes with poller in their arguments #328

    • Harvest ZAPI command line tool should limit perf-object-get-iter to subset of counters when using --counter #299


    Source code(tar.gz)
    Source code(zip)
    harvest-21.05.4-2.amd64.deb(37.18 MB)
    harvest-21.05.4-2.x86_64.rpm(30.91 MB)
    harvest-21.05.4-2_linux_amd64.tar.gz(40.87 MB)
  • v21.05.3(Jun 23, 2021)

    Change Log

    Releases

    21.05.3 / 2021-06-28

    This release introduces a significantly simplified way to connect Harvest and Prometheus, containerization enchantments, improved Harvest build times by 7x, reduced executable sizes by 3x, enabled cross compiling support, and several dashboard and other fixes.

    :tada: Thanks especially to @hashi825, @mamoep, @matejzero, and @florianmulatz for opening issues and pitching in to help fix them this release.

    Known Issues

    On RHEL and Debian, the example Unix collector does not work at the moment due to the harvest user lacking permissions to read the /proc filesystem. See #249 for details.

    Enhancements

    • Create Prometheus port range exporter that allows you to connect multiple pollers to Prometheus without needing to specify a port-per-poller. This makes it much easier to connect Prometheus and Harvest; especially helpful when you're monitoring many clusters #172

    • Improve Harvest build times by 7x and reduce executable sizes by 3x #100

    • Improve containerization with the addition of a poller-per-container Dockerfile. Create a new subcommand harvest generate docker which generates a docker-compose.yml file for all pollers defined in your config

    • Improve systemd integration by using instantiated units for each poller and a harvest target to tie them together. Create a new subcommand harvest generate systemd which generates a Harvest systemd target for all pollers defined in your config #systemd

    • Harvest doctor checks that all Prometheus exporters specify a unique port #118

    • Harvest doctor warns when an unknown exporter type is specified (likely a spelling error) #118

    • Add Harvest CUE validation and type-checking #208

    • bin/zapi uses the --config command line option to read the harvest config file. This brings this tool inline with other Harvest tools. This makes it easier to switch between multiple sets of harvest.yml files.

    • Harvest no longer writes pidfiles; simplifying management code and install #159

    Fixes

    • Ensure that the Prometheus exporter does not create duplicate labels #132

    • Ensure that the Prometheus exporter includes HELP and TYPE metatags when requested. Some tools require these #104

    • Disk status should return zero for a failed disk and one for a healthy disk. Thanks to @hashi825 for reporting and fixing #182

    • Lun info should be collected by Harvest. Thanks to @hashi825 for reporting and fixing #230

    • Grafana dashboard units, typo, and filtering fixes. Thanks to @mamoep, @matejzero, and @florianmulatz for reporting these :tada: #184 #186 #190 #192 #195 #202

    • Unix collector should not panic when harvest.yml is changed #160

    • Reduce log noise about poller lagging behind by few milliseconds. Thanks @hashi825 #214

    • Don't assume debug when foregrounding the poller process. Thanks to @florianmulatz for reporting. #246

    • Improve Docker all-in-one-container argument handling and simplify building in air gapped environments. Thanks to @optiz0r for reporting these issues and creating fixes. #166 #167 #168

    Source code(tar.gz)
    Source code(zip)
    harvest-21.05.3-2.amd64.deb(30.40 MB)
    harvest-21.05.3-2.x86_64.rpm(25.09 MB)
    harvest-21.05.3-2_linux_amd64.tar.gz(33.47 MB)
  • v21.05.2(Jun 14, 2021)

    Change Log

    Releases

    21.05.2 / 2021-06-14

    This release adds support for user-defined URLs for InfluxDB exporter, a new command to validate your harvest.yml file, improved logging, panic handling, and collector documentation. We also enabled GitHub security code scanning for the Harvest repo to catch issues sooner. These scans happen on every push.

    There are also several quality-of-life bug fixes listed below.

    Fixes

    • Handle special characters in cluster credentials #79
    • TLS server verification works with basic auth #51
    • Collect metrics from all disk shelves instead of one #75
    • Disk serial number and is-failed are missing from cdot query #60
    • Ensure collectors and pollers recover from panics #105
    • Cluster status is initially reported, but then stops being reported #66
    • Performance metrics don't display volume names #40
    • Allow insecure Grafana TLS connections --insecure and honor requested transport. See harvest grafana --help for details #111
    • Prometheus dashboards don't load when exemplar is true. Thanks to @sevenval-admins, @florianmulatz, and @unbreakabl3 for their help tracking this down and suggesting a fix. #96
    • harvest stop does not stop pollers that have been renamed #20
    • Harvest stops working after reboot on rpm/deb #50
    • harvest start shall start as harvest user in rpm/deb #129
    • harvest start detects stale pidfiles and makes start idempotent #123
    • Don't include unknown metrics when talking with older versions of ONTAP #116

    Enhancements

    • InfluxDB exporter supports user-defined URLs
    • Add workload counters to ZapiPerf #9
    • Add new command to validate harvest.yml file and optionally redact sensitive information #16 e.g. harvest doctor --config ./harvest.yml
    • Improve documentation for Unix, Zapi, and ZapiPerf collectors
    • Add Zerolog framework for structured logging #61
    • Vendor 3rd party code to increase reliability and make it easier to build in air-gapped environments #26
    • Make contributing easier with a digital CCLA instead of 1970's era PDF :)
    • Enable GitHub security code scanning
    • InfluxDB exporter provides the option to pass the URL end-point unchanged. Thanks to @steverweber for their suggestion and validation. #63
    Source code(tar.gz)
    Source code(zip)
    harvest-21.05.2-1.amd64.deb(68.21 MB)
    harvest-21.05.2-1.x86_64.rpm(74.93 MB)
    harvest-21.05.2-1_linux_amd64.tar.gz(75.28 MB)
  • v21.05.1(May 20, 2021)

    Change Log

    Releases

    21.05.1 / 2021-05-20

    Announcing the release of Harvest2. With this release the core of Harvest has been completely rewritten in Go. Harvest2 is a replacement for the older versions of Harvest 1.6 and below.

    If you're using one of the Harvest 2.x release candidates, you can do a direct upgrade.

    Going forward Harvest2 will follow a year.month.fix release naming convention with the first release being 21.05.0. See SUPPORT.md for details.

    IMPORTANT v21.05 increased Harvest's out-of-the-box security posture - self-signed certificates are rejected by default. You have two options:

    1. Setup client certificates for each cluster
    2. Disable the TLS check in Harvest. To disable, you need to edit harvest.yml and add use_insecure_tls=true to each poller or add it to the Defaults section. Doing so tells Harvest to ignore invalid TLS certificates.

    IMPORTANT RPM and Debian packages will be deprecated in the future, replaced with Docker and native packages.

    IMPORTANT Harvest 1.6 is end of support. We recommend you upgrade to Harvest 21.05 to take advantage of the improvements.

    Changes since rc2

    Fixes

    • Log mistyped exporter names and continue, instead of stopping
    • harvest grafana should work with custom harvest.yml files passed via --config
    • Harvest will try harder to stop pollers when they're stuck
    • Add Grafana version check to ensure Harvest can talk to a supported version of Grafana
    • Normalize rate counter calculations - improves latency values
    • Workload latency calculations improved by using related objects operations
    • Make cli flags consistent across programs and subcommands
    • Reduce aggressive logging; if first object has fatal errors, abort to avoid repetitive errors
    • Throw error when use_insecure_tls is false and there are no certificates setup for the cluster
    • Harvest status fails to print port number after restart
    • RPM install should create required directories
    • Collector now warns if it falls behind schedule
    • package.sh fails without internet connection
    • Version flag is missing new line on some shells #4
    • Poller should not ignore --config #28

    Enhancements

    • Add new exporter for InfluxDB
    • Add native install package
    • Add ARCHITECTURE.md and improve overall documentation
    • Use systemd harvest.service on RPM and Debian installs to manage Harvest
    • Add runtime profiling support - off by default, enabled with --profiling flag. See harvest start --help for details
    • Document how to use ONTAP client certificates for password-less polling
    • Add per-poller Prometheus end-point support with promPort
    • The release, commit and build date information are baked into the release executables
    • You can pick a subset of pollers to manage by passing the name of the poller to harvest. e.g. harvest start|stop|restart POLLERS
    Source code(tar.gz)
    Source code(zip)
    harvest-21.05.1-1.amd64.deb(58.03 MB)
    harvest-21.05.1-1.tar.gz(63.06 MB)
    harvest-21.05.1-1.x86_64.rpm(62.66 MB)
  • v21.05.0(May 20, 2021)

    Change Log

    Releases

    21.05.0 / 2021-05-20

    Announcing the release of Harvest2. With this release the core of Harvest has been completely rewritten in Go. Harvest2 is a replacement for the older versions of Harvest 1.6 and below.

    If you're using one of the Harvest 2.x release candidates, you can do a direct upgrade.

    Going forward Harvest2 will follow a year.month.fix release naming convention with the first release being 21.05.0. See SUPPORT.md for details.

    IMPORTANT v21.05 increased Harvest's out-of-the-box security posture - self-signed certificates are rejected by default. You have two options:

    1. Setup client certificates for each cluster
    2. Disable the TLS check in Harvest. To disable, you need to edit harvest.yml and add use_insecure_tls=true to each poller or add it to the Defaults section. Doing so tells Harvest to ignore invalid TLS certificates.

    IMPORTANT RPM and Debian packages will be deprecated in the future, replaced with Docker and native packages.

    IMPORTANT Harvest 1.6 is end of support. We recommend you upgrade to Harvest 21.05 to take advantage of the improvements.

    Changes since rc2

    Fixes

    • Log mistyped exporter names and continue, instead of stopping
    • harvest grafana should work with custom harvest.yml files passed via --config
    • Harvest will try harder to stop pollers when they're stuck
    • Add Grafana version check to ensure Harvest can talk to a supported version of Grafana
    • Normalize rate counter calculations - improves latency values
    • Workload latency calculations improved by using related objects operations
    • Make cli flags consistent across programs and subcommands
    • Reduce aggressive logging; if first object has fatal errors, abort to avoid repetitive errors
    • Throw error when use_insecure_tls is false and there are no certificates setup for the cluster
    • Harvest status fails to print port number after restart
    • RPM install should create required directories
    • Collector now warns if it falls behind schedule
    • package.sh fails without internet connection
    • Version flag is missing new line on some shells #4

    Enhancements

    • Add new exporter for InfluxDB
    • Add native install package
    • Add ARCHITECTURE.md and improve overall documentation
    • Use systemd harvest.service on RPM and Debian installs to manage Harvest
    • Add runtime profiling support - off by default, enabled with --profiling flag. See harvest start --help for details
    • Document how to use ONTAP client certificates for password-less polling
    • Add per-poller Prometheus end-point support with promPort
    • The release, commit and build date information are baked into the release executables
    • You can pick a subset of pollers to manage by passing the name of the poller to harvest. e.g. harvest start|stop|restart POLLERS
    Source code(tar.gz)
    Source code(zip)
    harvest-21.05.0-1.amd64.deb(58.00 MB)
    harvest-21.05.0-1.tar.gz(63.06 MB)
    harvest-21.05.0-1.x86_64.rpm(62.65 MB)
Owner
NetApp
We unlock the best of cloud.
NetApp
Sensu-go-postgres-metrics - The sensu-go-postgres-metrics is a sensu check that collects PostgreSQL metrics

sensu-go-postgres-metrics Table of Contents Overview Known issues Usage examples

Scott Cupit 0 Jan 12, 2022
The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability

metrics-agent The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability to help you gain vi

null 0 Jan 14, 2022
Observe host metrics collector Written in Go

Observe Host Metrics Collector Experimental collector and forwarder of host metr

Max Skybin 0 Jan 4, 2022
Metrics collector and ebpf-based profiler for C, C++, Golang, and Rust

Apache SkyWalking Rover SkyWalking Rover: Metrics collector and ebpf-based profiler for C, C++, Golang, and Rust. Documentation Official documentation

The Apache Software Foundation 90 Nov 30, 2022
Go http.RoundTripper that emits open telemetry metrics. This helps you easily get metrics for all external APIs you interact with.

go-otelroundtripper This package provides an easy way to collect http related metrics (e.g Response times, Status Codes, number of in flight requests

Ndole Studio 69 Nov 23, 2022
AppsFlyer 500 Dec 1, 2022
Shopping List webapp with Prometheus metrics endpoint

shopping-list Shopping List webapp with Prometheus metrics endpoint This is an example app with prometheus metrics integrated. This app runs on port 5

Lev Shvarts 0 Nov 4, 2021
Prometheus metrics for Go database/sql via VictoriaMetrics/metrics

sqlmetrics Prometheus metrics for Go database/sql via VictoriaMetrics/metrics Features Simple API. Easy to integrate. Install Go version 1.16+ go get

cristaltech 21 Nov 9, 2022
cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resource objects related of Kubernetes Cluster API.

Overview cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resou

Daimler Group 61 Oct 27, 2022
AutoK3s GEO collects metrics about locates remote IP-address and exposes metrics to InfluxDB.

AutoK3s GEO AutoK3s GEO collects metrics about locates remote IP-address and exposes metrics to InfluxDB. Thanks to https://freegeoip.live/ which prov

Jason 0 Jun 16, 2022
Flash-metrics - Flash Metrics Storage With Golang

Flash Metrics Storage bootstrap: $ echo -e "max-index-length = 12288" > tidb.con

null 3 Jan 8, 2022
This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

null 0 Jan 18, 2022
A CLI tool that generates OpenTelemetry Collector binaries based on a manifest.

OpenTelemetry Collector builder This program generates a custom OpenTelemetry Collector binary based on a given configuration. TL;DR $ go get github.c

OpenTelemetry - CNCF 52 Sep 14, 2022
File Collector is an application that uses HTTP protocol to collect homework and files.

File Collector File Collector is an application that uses HTTP protocol to collect homework and files. Usage When filecollector is run, it will automa

毛亚琛 1 Jun 16, 2022
This application is a NetFlow/IPFIX/sFlow collector in Go.

GoFlow This application is a NetFlow/IPFIX/sFlow collector in Go. It gathers network information (IP, interfaces, routers) from different flow protoco

Alireza Rostami 3 Mar 7, 2022
Go-based search engine URL collector , support Google, Bing, can be based on Google syntax batch collection URL

Go-based search engine URL collector , support Google, Bing, can be based on Google syntax batch collection URL

Re 62 Nov 9, 2022
This application is a NetFlow/IPFIX/sFlow collector in Go.

GoFlow This application is a NetFlow/IPFIX/sFlow collector in Go. It gathers network information (IP, interfaces, routers) from different flow protoco

null 2 Mar 11, 2022
This POC is built with the goal to collect events/logs from the host systems such as Kubernetes, Docker, VMs, etc. A buffering layer is added to buffer events from the collector

What is does This POC is build with the goal to collect events/logs from the host systems such as Kubernetes, docker, VMs etc. A buffering layer is ad

Gufran  Mirza 4 Nov 11, 2022
Alibaba iLogtail : The Lightweight Collector of SLS in Alibaba Cloud

Alibaba iLogtail - The Lightweight Collector of SLS in Alibaba Cloud | 中文版本 iLogtail was born for observable scenarios and has many production-level f

Alibaba 1k Dec 3, 2022
A collector api for golang

Install go 1.17 curl https://go.dev/dl/go1.17.3.linux-amd64.tar.gz tar -C /usr/local -xzf go1.17.3.linux-amd64.tar.gz Build for alpine container CGO_

Christophe Varoqui 0 Dec 28, 2021