Prometheus rule linter

Overview

pint

pint is a Prometheus rule linter.

Usage

There are two modes it works in:

  • CI PR linting
  • Ad-hoc linting of a selected files or directories

Pull Requests

It currently supports git for which it will find all commits on the current branch that are not present in parent branch and scan all modified files included in those changes.

Results can optionally be reported using BitBucket API to generate a report with any found issues. Each issue will create an inline annotation in BitBucket with a description of the issue. Exit code will always be zero when this is used, the report itself will indicate if checks passed or not.

Ad-hoc

Lint specified files and report any found issue.

You can lint selected files:

pint lint rules.yml

or directories:

pint lint path/to/dir

or both:

pint lint path/to/dir file.yml path/file.yml path/dir

Quick start

Requirements:

  1. Build the binary:

    git clone https://github.com/cloudflare/pint.git
    cd pint
    make build
  2. Run a simple syntax check on Prometheus alerting or recording rules file(s).

    ./pint lint /etc/prometheus/*.rules.yml
  3. Configuration file is optional, but without it pint will only run very basic syntax checks. See CONFIGURATION.md for details on config syntax. Check examples dir for sample config files. By default pint will try to load configuration from .pint.hcl, you can specify a different path using --config flag:

    ./pint --config /etc/pint.hcl lint /etc/prometheus/rules/*.yml
Comments
  • checks: series: add ignore_recordingrules option

    checks: series: add ignore_recordingrules option

    Add a new option ignore_recordingrules to the series check that will make it ignore errors when checking for the existence of metrics iff they are part of the same pull/merge request.

    This improves the series check for the CI use-case significantly because pull/merge requests might add new recording rules together with alerting rules, and without this option the series check will fail when they won't be found. With this option, pint would look at the newly added recorded rules as well.

    opened by GiedriusS 5
  • reporter: add GitHub support

    reporter: add GitHub support

    Add GitHub reporter support. All of the problems in the summary are submitted as (multi-)line comments on a given pull request. Add some small tests to try out this functionality. I have also done some ad-hoc tests.

    Also, exit with 1 when there are problems with severity Bug or higher. This is so that it would be possible to run pint ci in Jenkins so that it would report problems on GitHub & exit with 1 to indicate that a step has failed. All of the users which depend on the exit code being 0 can simply ignore the errors with ||: in Bash.

    cc @prymitive

    opened by GiedriusS 5
  • internal/promapi: Support environment variables in headers

    internal/promapi: Support environment variables in headers

    Prometheus header values are now expanded to replace environment variables, so if

    export CI_JWT="8iL6E1vh5qsGpccR"
    

    then

    prometheus "example" {
      uri = "https://prometheus.example.com"
      headers = {
        "Authorization": "Bearer $CI_JWT",
      }
    }
    

    will do requests on the Prometheus server with the HTTP header

    Authorization: Bearer 8iL6E1vh5qsGpccR
    
    opened by BenoitKnecht 4
  • too many open files

    too many open files

    Hello, thx for linter!

    We have repository with prometheus-alerts:

    $ find conf -type f | wc -l
    263
    

    When we run linter we got error:

    ./pint --config rules.hcl lint conf/test
    ...
    level=fatal msg="Execution completed with error(s)s" error="open conf/test/file.yml: too many open files"
    

    update if we run without config parameters, its work fine:

    ./pint lint conf/test
    ...
    level=info msg="File parsed" path=conf/test/file.yml rules=1
    
    opened by juev 4
  • GitHub reporter support

    GitHub reporter support

    Hello! Thanks for this software. It would be ideal to support GitHub, not just BitBucket as a platform where detected issues could be reported. What do you think? Would you be open to accepting pull requests implementing such a feature?

    opened by GiedriusS 4
  • Using yaml anchors results in false positives e.g yaml/parse

    Using yaml anchors results in false positives e.g yaml/parse

    Hi @prymitive ! Thank you for pint, it looks really good!

    We (SRE at Wikimedia Foundation) are evaluating it for our Prometheus alerts currently and ran into a potential issue around YAML anchors.

    Consider the following:

    groups:
    - name: certmanager
      rules:
      - &CertManagerCertExpirySoon
        alert: CertManagerCertExpirySoon
        expr: certmanager_certificate_expiration_timestamp_seconds - time() < (9 * 24 * 3600)
        for: 5m
        labels:
          team: sre
          severity: warning
        annotations:
          summary: "Certificate {{ $labels.namespace }}/{{ $labels.name }} in is about to expire"
          description: "The certificate {{ $labels.name }} in namespace {{ $labels.namespace }} is {{ $value | humanizeDuration }} from expiry. It should have been refreshed 9 days before expiry"
          runbook: https://wikitech.wikimedia.org/wiki/Kubernetes/cert-manager
      - <<: *CertManagerCertExpirySoon
        expr: certmanager_certificate_expiration_timestamp_seconds - time() < (7 * 24 * 3600)
        labels:
          team: sre
          severity: critical
    

    Running pint lint will result in a yaml/parse error:

    incomplete rule, no alert or record key (yaml/parse)                                                                                                                               
    ||     expr: certmanager_certificate_expiration_timestamp_seconds - time() < (7 * 24 * 3600)
    

    Even though the alert does have alert clause after resolving anchors. What do you think ?

    opened by filippog 3
  • RFE: Allow selecting alerting rules based on whether or not they have a 'for:'

    RFE: Allow selecting alerting rules based on whether or not they have a 'for:'

    Prometheus alerting rules can have a 'for:'. In our environment, all alert rules with a for: should have an additional label, and all alert rules without it should lack that label. It would be nice if Pint could check for things like this or otherwise distinguish between for and for-less alert rules (who knows, someone might have a rule that no alerts should use for:).

    As an extra bonus feature it would be nice if the for: duration was available for lint checks in some way, or you could verify that some label had the same duration as the for:. Our additional label should actually have the same duration value as the for: (we use it in alert messages to say how long ago the alert condition started to be true), and I can imagine that some people might want to require either minimum or maximum for: durations.

    opened by siebenmann 3
  • Re-Review post issues

    Re-Review post issues

    Hi! Got an error after new commits after the first PR review. Steps to reproduce:

    1. Create a PR with some errors in the prometheus targets file;
    2. Get your first PR review from pint;
    3. Change something else and push commits to PR;
    4. You will receive this error. level=fatal msg="Execution completed with error(s)" error="submitting reports: creating review: POST https://api.github.com/repos/xxx/yyy/pulls/95/reviews: 422 Unprocessable Entity [{Resource: Field: Code: Message:Pull request review thread line must be part of the diff, Pull request review thread start line must be part of the same hunk as the line., and Pull request review thread diff hunk can't be blank}]"
    opened by rk-p2p 2
  • Lost support for linting rules organized into PrometheusRule files

    Lost support for linting rules organized into PrometheusRule files

    My team deploys the Prometheus stack on Kubernetes and uses the Prometheus Operator and its PrometheusRule CRD for managing rules. All of our rules are organized into PrometheusRule files. Example:

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: alertmanager
    spec:
      groups:
      - name: Alertmanager
        rules:
        - alert: Alertmanager failed to reload config
           ... etc ...
    

    I first started looking at Pint earlier this year around version 18. In that version, Pint is able to parse the rules out of these files. I just recently came back to actually implementing Pint in our project this week, though, and it seems that a file with this structure is no longer supported:

    level=error msg="Failed to unmarshal file content" error="yaml: unmarshal errors:\n  line 1: field apiVersion not found in type rulefmt.RuleGroups\n  line 2: field kind not found in type rulefmt.RuleGroups\n  line 3: field metadata not found in type rulefmt.RuleGroups\n  line 5: field spec not found in type rulefmt.RuleGroups" lines=1-66 path=rules/grafana.yaml
    
    rules/grafana.yaml:1: field apiVersion not found in type rulefmt.RuleGroups (yaml/parse)
     1 | apiVersion: monitoring.coreos.com/v1
    

    Do you happen to know when this support was removed, and is there possibly any built-in support or config tricks I could use to use the more recent versions without having to change how we organize our rules?

    opened by davidpellcb 2
  • support authenticated endpoints?

    support authenticated endpoints?

    Is there a way to specify HTTP auth credentials to probe a remote prometheus server? ours is protected by a password, and it seems it fails, even with a URL like https://user:[email protected]/.

    opened by anarcat 2
  • panic: runtime error: index out of range [4] with length 4

    panic: runtime error: index out of range [4] with length 4

    level=debug msg="Scheduling prometheus query" query="increase(sum without (reason, status) (nzbget_history_status_count)[4h:])" uri=http://server:9090
    level=debug msg="Cache hit" key=4ac2fe43a6df450657211f8cf8c7d47775f6b543 query="increase(sum without (reason, status) (nzbget_history_status_count)[4h:])" uri=http://server:9090
    level=debug msg="Parsed response" query="increase(sum without (reason, status) (nzbget_history_status_count)[4h:])" series=1 uri=http://server:9090
    panic: runtime error: index out of range [4] with length 4
    
    goroutine 109 [running]:
    github.com/cloudflare/pint/internal/parser/utils.RemoveConditions({0xc0001df960?, 0xc000cd8fa8?})
    	github.com/cloudflare/pint/internal/parser/utils/conditions.go:43 +0x9d2
    github.com/cloudflare/pint/internal/parser/utils.RemoveConditions({0xc000cda200?, 0xc0007de0d0?})
    	github.com/cloudflare/pint/internal/parser/utils/conditions.go:13 +0x29c
    github.com/cloudflare/pint/internal/parser/utils.RemoveConditions({0xc0009f4420?, 0x10e00f0?})
    	github.com/cloudflare/pint/internal/parser/utils/conditions.go:17 +0x388
    github.com/cloudflare/pint/internal/checks.VectorMatchingCheck.checkNode({0xc000cd9970?}, {0x10e00f0, 0xc00022bf80}, 0xc00082bf40)
    	github.com/cloudflare/pint/internal/checks/promql_vector_matching.go:64 +0x5f
    github.com/cloudflare/pint/internal/checks.VectorMatchingCheck.Check({0x0?}, {0x10e00f0?, 0xc00022bf80?}, {0x0, 0xc000940840, {{0x0, 0x0}, {0x0, 0x0}, 0x0}}, ...)
    	github.com/cloudflare/pint/internal/checks/promql_vector_matching.go:50 +0x145
    main.scanWorker({0x10e00f0, 0xc00022bf80}, 0xc000256900?, 0xc000256960?)
    

    My config file is

    prometheus "poseidon" {
      uri      = "http://server:9090"
      timeout  = "60s"
      required = true
    }
    
    checks {
      disabled = ["promql/fragile"]
    }
    
    opened by frebib 2
  • Allow to use GitHub Integration without a GitHub App

    Allow to use GitHub Integration without a GitHub App

    Prior to v0.39 it was possible to use the GitHub integration without a GitHub App. This allowed us to run pint in jenkins and report errors back to pull requests in our GitHub Enterprise instance. With #486 being merged this is not possible anymore:

    23:08:48  level=fatal msg="Execution completed with error(s)" error="submitting reports: failed to create a new check run: POST https://github.example.com/api/v3/repos/our-org/our-repo/check-runs: 403 You must authenticate via a GitHub App. []"
    
    

    It would be great to have this back again.

    PS: Thanks for this great tool! 🤩👍

    opened by pascal-hofmann 1
  • Provide alternative to yaml comments for configuring linting rules

    Provide alternative to yaml comments for configuring linting rules

    Hey, just discovered this tool, awesome work!

    We would like to use pint in watch mode for continous scanning of our prometheus rules, and use the owner feature so we can route alerts for problems to the right team.

    Unfortunately, pint linting rules can only be configured via yaml comments, but we use jsonnet to generate our prometheus rules (serializing them to yaml via std.manifestYamlDoc()) and I couldn't find a way to add comments to the generated output. I am no jsonnet expert but it looks like the design of jsonnet doesn't even allow this.

    Our primary motivation for jsonnet is that we can use the existing monitoring-mixins ecosystem and get an improved authoring experience over raw yaml - which works pretty well so far.

    Some linters allow their rules to be configured in external config files, e.g. in PMD you can define rulesets - but I wouldn't use XML today :-) I havent thought about how this could look in detail but in general I think such a mechanism could solve our problem.

    Would you be open to implement or accept a PR for such a feature?

    opened by mario-steinhoff-gcx 3
  • Panic, slice index out of bounds

    Panic, slice index out of bounds

    I just got this panic:

    panic: runtime error: slice bounds out of range [:243] with capacity 242
    
    goroutine 1 [running]:
    github.com/cloudflare/pint/internal/reporter.ConsoleReporter.Submit(0xbb2440, 0xc00011e010, 0xc0012e8000, 0xd3, 0x143, 0xbbddd0, 0xc00026cab0, 0x2, 0x2)
    	/home/joaquin/projects/personal/github/pint/internal/reporter/console.go:71 +0x1073
    main.actionLint(0xc0001a7740, 0x2, 0x2)
    	/home/joaquin/projects/personal/github/pint/cmd/pint/lint.go:47 +0x56a
    github.com/urfave/cli/v2.(*Command).Run(0xc00016d440, 0xc0001a7600, 0x0, 0x0)
    	/home/joaquin/.asdf/installs/golang/1.16.3/packages/pkg/mod/github.com/urfave/cli/[email protected]/command.go:163 +0x4dd
    github.com/urfave/cli/v2.(*App).RunContext(0xc0002321a0, 0xbbd5f0, 0xc00011a010, 0xc000124000, 0x5, 0x5, 0x0, 0x0)
    	/home/joaquin/.asdf/installs/golang/1.16.3/packages/pkg/mod/github.com/urfave/cli/[email protected]/app.go:313 +0x810
    github.com/urfave/cli/v2.(*App).Run(...)
    	/home/joaquin/.asdf/installs/golang/1.16.3/packages/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
    main.main()
    	/home/joaquin/projects/personal/github/pint/cmd/pint/main.go:72 +0x106
    
    

    on this line: https://github.com/cloudflare/pint/blob/452a61ca4aaaca44bade5302879657454d233d06/internal/reporter/console.go#L71

    Seems like the check can fail sometimes

    opened by xocasdashdash 9
Releases(v0.39.0)
Owner
Cloudflare
Cloudflare
A sub module of EdgeGallery MECM which responsible for the app rule management

mecm-apprulemgr 介绍 Application rule manager 软件架构 软件架构说明 安装教程 xxxx xxxx xxxx 使用说明 xxxx xxxx xxxx 参与贡献 Fork 本仓库 新建 Feat_xxx 分支 提交代码 新建 Pull Request 特技 使

EdgeGallery 22 Jan 24, 2022
Prevent Kubernetes misconfigurations from ever making it (again 😤) to production! The CLI integration provides policy enforcement solution to run automatic checks for rule violations. Docs: https://hub.datree.io

What is Datree? Datree helps to prevent Kubernetes misconfigurations from ever making it to production. The CLI integration can be used locally or in

datree.io 6.1k Jan 1, 2023
Translate Prometheus Alerts into Kubernetes pod readiness

prometheus-alert-readiness Translates firing Prometheus alerts into a Kubernetes readiness path. Why? By running this container in a singleton deploym

Coralogix 20 Oct 31, 2022
A beginner friendly introduction to prometheus 🔥

Prometheus-Basics A beginner friendly introduction to prometheus. Table of Contents What is prometheus ? What are metrics and why is it important ? Ba

S Santhosh Nagaraj 1.6k Dec 29, 2022
Doraemon is a Prometheus based monitor system

English | 中文 Doraemon Doraemon is a Prometheus based monitor system ,which are made up of three components——the Rule Engine,the Alert Gateway and the

Qihoo 360 632 Nov 28, 2022
A set of tests to check compliance with the Prometheus Remote Write specification

Prometheus Remote Write Compliance Test This repo contains a set of tests to check compliance with the Prometheus Remote Write specification. The test

Tom Wilkie 103 Dec 4, 2022
Automating Kubernetes Rollouts with Argo and Prometheus. Checkout the demo URL below

observe-argo-rollout Demo for Automating and Monitoring Kubernetes Rollouts with Argo and Prometheus Performing Demo The demo can be found on Katacoda

null 33 Nov 16, 2022
📡 Prometheus exporter that exposes metrics from SpaceX Starlink Dish

Starlink Prometheus Exporter A Starlink exporter for Prometheus. Not affiliated with or acting on behalf of Starlink(™) ?? Starlink Monitoring System

DanOpsTech 87 Dec 19, 2022
A tool to dump and restore Prometheus data blocks.

promdump promdump dumps the head and persistent blocks of Prometheus. It supports filtering the persistent blocks by time range. Why This Tool When de

Ivan Sim 111 Dec 16, 2022
🦥 Easy and simple Prometheus SLO generator

Sloth Introduction Use the easiest way to generate SLOs for Prometheus. Sloth generates understandable, uniform and reliable Prometheus SLOs for any k

Xabier Larrakoetxea Gallego 1.4k Jan 4, 2023
Prometheus exporter for Chia node metrics

chia_exporter Prometheus metric collector for Chia nodes, using the local RPC API Building and Running With the Go compiler tools installed: go build

Kevin Retzke 33 Sep 19, 2022
Plays videos using Prometheus, e.g. Bad Apple.

prom_bad_apple Plays videos using Prometheus, e.g. Bad Apple. Inspiration A while back I thought this blog post and the corresponding source code were

Jacob Colvin 92 Nov 30, 2022
k6 prometheus output extension

xk6-prometheus A k6 extension implements Prometheus HTTP exporter as k6 output extension. Using xk6-prometheus output extension you can collect metric

Iván Szkiba 38 Nov 22, 2022
Generate Prometheus rules for your SLOs

prometheus-slo Generates Prometheus rules for alerting on SLOs. Based on https://developers.soundcloud.com/blog/alerting-on-slos. Usage Build and Run

Ganesh Vernekar 16 Nov 27, 2022
Nvidia GPU exporter for prometheus using nvidia-smi binary

nvidia_gpu_exporter Nvidia GPU exporter for prometheus, using nvidia-smi binary to gather metrics. Introduction There are many Nvidia GPU exporters ou

Utku Özdemir 199 Jan 5, 2023
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

DCGM-Exporter This repository contains the DCGM-Exporter project. It exposes GPU metrics exporter for Prometheus leveraging NVIDIA DCGM. Documentation

NVIDIA Corporation 236 Dec 27, 2022
Prometheus exporter for Amazon Elastic Container Service (ECS)

ecs_exporter ?? ?? ?? This repo is still work in progress and is subject to change. This repo contains a Prometheus exporter for Amazon Elastic Contai

Prometheus Monitoring Community 50 Nov 27, 2022
A simple tool who pulls data from Online.net API and parse them to a Prometheus format

Dedibox backup monitoring A simple tool who reads API from Online.net and parse them into a Prometheus-compatible format. Conceived to be lightweight,

Florian Forestier / Artheriom 4 Aug 16, 2022
Prometheus exporter for DeadMansSnitch

DeadMansSnitch Exporter Prometheus exporter for DeadMansSnitch information (snitches) Configuration Usage: deadmanssnitch-exporter [OPTIONS] Applic

WebDevOps 3 Apr 6, 2022