A set of tests to check compliance with the Prometheus Remote Write specification

Overview
Issues
  • promql+alert_generator: Dockerized the toolset, updated docs.

    promql+alert_generator: Dockerized the toolset, updated docs.

    Additionally updated docs.

    The motivation for this is that we can now set automated tests where in future we might want to provide docker images with exact set of test cases in versioned image.

    Signed-off-by: Bartlomiej Plotka [email protected]

    opened by bwplotka 9
  • Epic: Alert Generator Compliance Test Suite

    Epic: Alert Generator Compliance Test Suite

    Based on the specification, here is the list of all the high-level cases that needs to be covered by the test suite. In all the cases, the content of the alerts, APIs, time series, are checked to be correct.

    • [x] Presence of all the template variable and functions as described in the specification (across all the rules, not all in a single rule).
      • Data
        • [x] $labels.something .Labels.something
        • [x] $value .Value
      • Queries
        • [x] query
        • [x] first
        • [x] label
        • [x] value
        • [x] sortByLabel
      • Numbers
        • [x] humanize
        • [x] humanize1024
        • [x] humanizeDuration
        • [x] humanizePercentage
        • [x] humanizeTimestamp
      • Strings
        • [x] title
        • [x] toUpper
        • [x] toLower
        • [x] stripPort
        • [x] match
        • [x] reReplaceAll
        • [x] parseDuration
      • Others
        • [x] args
      • Undocumented and/or not needed:
        • strvalue (undocumented, not needed)
        • pathPrefix (not needed, only in consoles, and also undocumented)
        • .ExternalLabels $externalLabels (not needed)
        • .ExternalURL $externalURL (not needed)
        • graphLink (not needed)
        • tableLink (not needed)
        • tmpl (not needed, only in consoles)
        • safeHtml (not needed, only in consoles)
    • [x] Alert that goes from pending->firing->inactive.
    • [x] Alert that goes from pending->inactive.
    • [x] Rule that never becomes active (i.e. alerts in pending or firing)
    • [x] pending alerts having changing annotation values (checked via API)
    • [x] firing and inactive alerts being sent when they first went into those states.
    • [x] firing alert being re-sent at expected intervals when the alert is active with changing annotation contents.
    • [x] inactive alert being re-sent at expected intervals up to a certain time and not after that.
    • [x] Alert that goes directly to firing state (skipping the pending state) because of zero for duration.
    • [x] Alert that becomes active after having fired already and gone into inactive state for both the cases where for duration is zero and non zero. Here we should test 2 cases: One where inactive alert was still being sent, hence should stop sending that. Two is the inactive alert was not being sent anymore.
    • [x] Rule that produces new alerts that go from pending->firing->inactive while already having active alerts.
    • [x] When the for duration is non-zero and less than the evaluation interval, firing alert must be sent after the second evaluation of the rule and not before.
    • [x] A rule group having rules which are dependant on the ALERTS series from the rules above it in the same group.
    • [x] Expansion of template in annotations only use the labels from the query result as source data even if those labels get overridden by the rules. They do not use the rules' additional labels.
    • [x] Alert goes into inactive when there is no more data. Both when in firing and pending.

    All the time comparison will be done within a certain acceptable delta and need not be exact.

    opened by codesome 8
  • [alert generator] question about `alertname` label

    [alert generator] question about `alertname` label

    The spec says the following:

    The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label.
    

    The statement above says that rule's name should override any existing alertname label. Does it mean that in templates $labels.alertname and .Labels.alertname values should behave in the same way? One of the testcases expects template value to be equal to the existing alertname label: https://github.com/prometheus/compliance/blob/c7c726de89973d77cb491faa1b32cfddf7dcde8a/alert_generator/cases/case_new_alerts_and_order_check.go#L254 But this looks controversial to what spec says.

    opened by hagen1778 4
  • Import the PromLabs PromQL compliance tester

    Import the PromLabs PromQL compliance tester

    This is a pretty minimal import without too many changes, to get it moved before doing anything more.

    The first commit imports it completely unchanged, the second one just fixes minimal things to adjust to the new repo location.

    opened by juliusv 4
  • Allow config file splitting, update test cases

    Allow config file splitting, update test cases

    Sorry, this is in one commit because I didn't do a separate update of test cases in the old, non-split config file.

    You can now pass -config-file multiple times, leading for the mentioned files to be concatenated before YAML parsing happens.

    Signed-off-by: Julius Volz [email protected]

    opened by juliusv 3
  • Alert Generator Compliance Specification 1.0

    Alert Generator Compliance Specification 1.0

    This PR adds a specification for alert-generator compliance which was open for public review at https://docs.google.com/document/d/1QyGA3c0Eys9rZRMEbSSXcd1C0yuph8wCQ6B3Dc90rk0/edit

    opened by codesome 3
  • Add test for retry behaviour: should retries 5xx, should not retry 4xx.

    Add test for retry behaviour: should retries 5xx, should not retry 4xx.

    --- FAIL: TestRemoteWrite (50.30s)
        --- PASS: TestRemoteWrite/grafana (0.00s)
            --- PASS: TestRemoteWrite/grafana/Retries400 (10.07s)
            --- PASS: TestRemoteWrite/grafana/Retries500 (10.08s)
        --- FAIL: TestRemoteWrite/otelcollector (0.01s)
            --- PASS: TestRemoteWrite/otelcollector/Retries400 (10.02s)
            --- FAIL: TestRemoteWrite/otelcollector/Retries500 (10.02s)
        --- PASS: TestRemoteWrite/prometheus (0.01s)
            --- PASS: TestRemoteWrite/prometheus/Retries400 (10.05s)
            --- PASS: TestRemoteWrite/prometheus/Retries500 (10.10s)
        --- FAIL: TestRemoteWrite/telegraf (0.01s)
            --- PASS: TestRemoteWrite/telegraf/Retries400 (10.02s)
            --- FAIL: TestRemoteWrite/telegraf/Retries500 (10.02s)
        --- FAIL: TestRemoteWrite/vector (0.01s)
            --- PASS: TestRemoteWrite/vector/Retries400 (10.03s)
            --- FAIL: TestRemoteWrite/vector/Retries500 (10.03s)
    

    Signed-off-by: Tom Wilkie [email protected]

    opened by tomwilkie 3
  • [alert_generator]

    [alert_generator] "mismatch in EndsAt" error question

    alert_generator test suite checks the received alerts for the correctness of their properties. One of those checks is comparing if EndsAt param is within the time range between now (when alert was received by alert_generator) and now+delta, where delta is usually 4*resendDelay - see https://github.com/prometheus/compliance/blob/main/alert_generator/cases/expected_alert.go#L80-L96

    However, the time when alert was received isn't always the time when alert was triggered. Since Prometheus aligns the time slots when alert should be executed, the real time and timestamp of alert execution can differ - see https://github.com/prometheus/prometheus/blob/580e852f1028ecbcaa67836f2da5230ac7c35fd0/rules/manager.go#L411-L419

    Should this mean, that alert_generator should calculate EndsAt param based on alert's ActiveAt param instead of time when alert was actually received?

    opened by hagen1778 2
  • Don't require the up metric for non-up-metric tests.

    Don't require the up metric for non-up-metric tests.

    Its a bit harsh to fail agents on not providing the up metric in unrelated tests. So remove that check, and use other signals. Makes NameLabel test pass for Telegraf and Otel.

    Signed-off-by: Tom Wilkie [email protected]

    opened by tomwilkie 2
  • alert_generator: add `vmalert` config

    alert_generator: add `vmalert` config

    Add testing configuration for VictoriaMetrics vmalert component whcih can be used as alerts generator. vmalert can be configured to use Prometheus as remote storage for querying alerts and writing back results.

    opened by hagen1778 1
  • Allow querying for negative offsets

    Allow querying for negative offsets

    Prometheus 2.33 started to allow negative offsets, so we need to mark queries with negative offsets as valid now.

    Signed-off-by: Julius Volz [email protected]

    opened by juliusv 1
  • [alert_generator] Add docs on how to use the test suite

    [alert_generator] Add docs on how to use the test suite

    Similar to the PromQL tests, I don't plan on building binaries and docker images in the CI and rather have a bunch of Go CLI commands in the docs that can be run to use the test suite.

    opened by codesome 0
  • Prometheus is not fully compatible with OpenMetrics tests

    Prometheus is not fully compatible with OpenMetrics tests

    What did you do?

    We want to ensure OpenMetrics / Prometheus compatibility in the OpenTelemetry Collector. We have been building compatibility tests to verify the OpenMetrics spec is fully supported on both the OpenTelemetry Collector Prometheus receiver and PRW exporter as well as in Prometheus itself.

    We used the OpenMetrics metrics test data available at https://github.com/OpenObservability/OpenMetrics/tree/main/tests/testdata/parsers

    Out of a total of 161 negative tests in OpenMetrics, 94 tests pass (these tests are dropped) with an 'up' value of 0; 67 tests are not dropped and have an 'up' value of 1 and 22 tests have incorrectly ingested metrics.

    In order to test Prometheus itself, we set up a metrics HTTP endpoint that exposes invalid/bad metrics from the OpenMetrics tests. We then configured Prometheus 2.31.0 to scrape the metrics endpoint.

    What did you expect to see?

    Expected result: The scrape should fail since the target has invalid metric and the appropriate error should be reported.

    For e.g with following metric data: bad_counter_values_1 (https://raw.githubusercontent.com/OpenObservability/OpenMetrics/main/tests/testdata/parsers/bad_counter_values_1/metrics)

    # TYPE a counter
    a_total -1
    # EOF
    

    What did you see instead? Under which circumstances?

    Current behavior: Scrape is successful. There are multiple bad test cases that are scraped successfully by Prometheus.

    For example - Using bad_counter_values_1 (#5 listed below) does not show an error even though it is an negative counter value. According to OpenMetrics tests, this metric should not be parsed.

    Screenshot 2021-11-03 at 2 49 52 PM

    You can see no error has been reported and the scrape is successful.

    Screenshot 2021-11-03 at 2 50 20 PM

    Similar to bad_counter_values_1 test case, there are multiple bad test cases where the scrape is successful and metrics are ingested by Prometheus:

    1. bad_missing_or_extra_commas_0
    2. bad_metadata_in_wrong_place_1
    3. bad_counter_values_18
    4. bad_grouping_or_ordering_9
    5. bad_counter_values_1
    6. bad_histograms_2
    7. bad_counter_values_16
    8. bad_value_1
    9. bad_missing_or_extra_commas_2
    10. bad_invalid_labels_6
    11. bad_grouping_or_ordering_8
    12. bad_metadata_in_wrong_place_0
    13. bad_grouping_or_ordering_10
    14. bad_grouping_or_ordering_0
    15. bad_value_2
    16. bad_metadata_in_wrong_place_2
    17. bad_text_after_eof_1
    18. bad_value_3
    19. bad_counter_values_0
    20. bad_grouping_or_ordering_3
    21. bad_histograms_3
    22. bad_blank_line

    Environment

    • System information:

    Darwin 20.6.0 x86_64

    • Prometheus version:

    version=2.31.0

    • Prometheus configuration file:
    global:
      scrape_interval: 5s
    
    scrape_configs:
      - job_name: "open-metrics-scrape"
        static_configs:
          - targets: ["localhost:3000"]
    
    

    cc: @PaurushGarg @mustafain117

    opened by alolita 9
  • Remote write CI tests always failing

    Remote write CI tests always failing

    The remote write CI tests seem to always be failing, which is annoying, especially on unrelated PRs (e.g. for PromQL):

    remote-write-test-failures

    @tomwilkie would you be able to look into this and either fix or disable them?

    bug 
    opened by juliusv 0
  • otel remote_write fails when run with -race

    otel remote_write fails when run with -race

    When running the remote_write tests for the OpenTelemetry collector with the -race flag:

    go test -race --tags=compliance -run "TestRemoteWrite/otel/.+" -v ./
    

    The tests fail due to data races: https://gist.github.com/kirbyquerby/59e4e57d59ba131307fb3d9ac2a4e35d

    This issue was found when adding these tests to the OpenTelemetry collector (open-telemetry/opentelemetry-collector-contrib/pull/5014), as the tests in the collector are run with -race.

    opened by kirbyquerby 0
  • alertmanager

    alertmanager

    A lot of alertmanager users are experiencing issues due to mismatch between alertmanager and amtool.

    https://github.com/prometheus/alertmanager/pull/2672 introduces a way of detecting this, but some third party tools like cortex do not reply with the alertmanager version when queried on the alertmanager version endpoint, instead pointing to their own version.

    We should, in the alertmanager compliance, state that the version endpoint must return the alertmanager release.

    opened by roidelapluie 0
Owner
Prometheus
Prometheus
A specification compliant implementation of RFC7636 - Proof Key for Code Exchange (PKCE) for Go

pkce implements the client side of RFC 7636 "Proof Key for Code Exchange by OAuth Public Clients" (PKCE) to enable the generation of cryptographically secure and specification compliant code verifiers and code challenges

Matthew Hartstonge 1 Mar 31, 2022
Package goth provides a simple, clean, and idiomatic way to write authentication packages for Go web applications.

Goth: Multi-Provider Authentication for Go Package goth provides a simple, clean, and idiomatic way to write authentication packages for Go web applic

Mark Bates 3.7k Jun 27, 2022
SSH Manager - manage authorized_keys file on remote servers

SSH Manager - manage authorized_key file on remote servers This is a simple tool that I came up after having to on-boarding and off-boarding developer

Sam Ban 32 Jun 27, 2022
This repository contains a set of tools to help you implement IndieAuth, both server and client, in Go.

This repository contains a set of tools to help you implement IndieAuth, both server and client, in Go.

Henrique Dias 16 Jun 27, 2022
A set of tests to check compliance with the Prometheus Remote Write specification

Prometheus Compliance Tests This repo contains code to test compliance with various Prometheus standards. PromQL The promql directory contains code to

Prometheus 92 Jun 7, 2022
Awesome-italia-remote - A list of remote-friendly or full-remote companies that targets Italian talents

Awesome Italia Remote A list of remote-friendly or full-remote companies that ta

ItaliaRemote 1.2k Jul 1, 2022
Time Series Database based on Cassandra with Prometheus remote read/write support

SquirrelDB SquirrelDB is a scalable high-available timeseries database (TSDB) compatible with Prometheus remote storage. SquirrelDB store data in Cass

Bleemeo 16 Jun 18, 2022
Prometheus Remote Write Go client

promwrite Prometheus Remote Write Go client with minimal dependencies. Supports Prometheus, Cortex, VictoriaMetrics etc. Install go get -u github.com/

CAST.AI 5 Jan 26, 2022
AKS compliance validation pack for Probr

Probr AKS Service Pack The Probr AKS Service pack compliments the Kubernetes service pack with Azure Kubernetes Service (AKS)-specific compliance chec

null 15 Sep 9, 2021
Kubernetes compliance validation pack for Probr

Probr Kubernetes Service Pack The Probr Kubernetes Service pack provides a variety of provider-agnostic compliance checks. Get the latest stable versi

null 18 Mar 30, 2022
Extremely flexible golang deep comparison, extends the go testing package, tests HTTP APIs and provides tests suite

go-testdeep Extremely flexible golang deep comparison, extends the go testing package. Latest news Synopsis Description Installation Functions Availab

Maxime Soulé 305 Jun 29, 2022
Rr-e2e-tests - Roadrunner end-to-end tests repository

RoadRunner end-to-end plugins tests License: The MIT License (MIT). Please see L

RoadRunner 1 Jan 21, 2022
Go testing in the browser. Integrates with `go test`. Write behavioral tests in Go.

GoConvey is awesome Go testing Welcome to GoConvey, a yummy Go testing tool for gophers. Works with go test. Use it in the terminal or browser accordi

SmartyStreets 7.3k Jun 28, 2022
Ruby on Rails like test fixtures for Go. Write tests against a real database

testfixtures Warning: this package will wipe the database data before loading the fixtures! It is supposed to be used on a test database. Please, doub

null 808 Jun 20, 2022
A next-generation testing tool. Orion provides a powerful DSL to write and automate your acceptance tests

Orion is born to change the way we implement our acceptance tests. It takes advantage of HCL from Hashicorp t o provide a simple DSL to write the acceptance tests.

Wesovi Labs 42 Jun 18, 2022
Terratest is a Go library that makes it easier to write automated tests for your infrastructure code.

Terratest is a Go library that makes it easier to write automated tests for your infrastructure code. It provides a variety of helper functions and patterns for common infrastructure testing tasks,

Gruntwork 6.1k Jun 22, 2022
Go testing in the browser. Integrates with `go test`. Write behavioral tests in Go.

GoConvey is awesome Go testing Welcome to GoConvey, a yummy Go testing tool for gophers. Works with go test. Use it in the terminal or browser accordi

SmartyStreets 7.3k Jun 25, 2022
W5-test-go - Write functions to pass the tests with the cases need to pass

Week 5 Assignment In this assignment, we expect to you write functions to pass t

Yahya Mehmet Sağdur 1 Feb 11, 2022
Snapshot - snapshot provides a set of utility functions for creating and loading snapshot files for using snapshot tests.

Snapshot - snapshot provides a set of utility functions for creating and loading snapshot files for using snapshot tests.

Daniel J. Rollins 2 Jan 27, 2022
check-cert: Go-based tooling to check/verify certs

check-cert: Go-based tooling to check/verify certs

Adam Chalkley 9 Jun 14, 2022
Check-location - A golang service to check user location using their IP address

this is a golang service to check user location using their IP address. The purp

Abdessamad Bensaad 0 Jan 10, 2022
Check-load - Simple cross-platform load average check

Sensu load average check Table of Contents Overview Usage examples Configuration

KOHMURA Jin 0 Jun 16, 2022
KV - a toy in-memory key value store built primarily in an effort to write more go and check out grpc

KV KV is a toy in-memory key value store built primarily in an effort to write more go and check out grpc. This is still a work in progress. // downlo

Ali Mir 0 Dec 30, 2021
Write controller-runtime based k8s controllers that read/write to git, not k8s

Git Backed Controller The basic idea is to write a k8s controller that runs against git and not k8s apiserver. So the controller is reading and writin

Darren Shepherd 50 Dec 10, 2021
A small CLI tool to check connection from a local machine to a remote target in various protocols.

CHK chk is a small CLI tool to check connection from a local machine to a remote target in various protocols.

null 25 Mar 30, 2022
PoC for CVE-2015-1635 / MS15-034 - HTTP.sys Allows Remote Code Execution / Check & DOS

CVE-2015-1635 PoC for CVE-2015-1635 / MS15-034 - HTTP.sys Allows Remote Code Execution / Check & DOS ./MS15-034 <URL> <RESOURCE> <FLAG [0 or 18]> Note

Nikola Kipariz Stamov 0 Nov 3, 2021
Application wirtten in GO to check if the port on the remote host is open

portcheck A simple Pod that get API POST request with port type and number with a target's IP address and checks if the destination port is available

Oren Oichman 0 Nov 26, 2021
A tool to check whether docker images exist in the remote registry.

Check Docker Image A tool to check whether docker images exist in the remote registry. Build project: go build -o check-image . Example usage: REGISTR

Hao-Ming, Hsu 1 May 26, 2022
🔥🔥 🌈 Golang configuration,use to Viper reading from remote Nacos config systems. Viper remote for Naocs.

Viper remote for Nacos Golang configuration,use to Viper reading from remote Nacos config systems. Viper remote for Naocs. runtime_viper := viper.New(

yoyofxteam 21 May 24, 2022