The Prometheus monitoring system and time series database.

Overview

Prometheus

CircleCI Docker Repository on Quay Docker Pulls Go Report Card CII Best Practices Gitpod ready-to-code

Visit prometheus.io for the full documentation, examples and guides.

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

The features that distinguish Prometheus from other metrics and monitoring systems are:

  • A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
  • PromQL, a powerful and flexible query language to leverage this dimensionality
  • No dependency on distributed storage; single server nodes are autonomous
  • An HTTP pull model for time series collection
  • Pushing time series is supported via an intermediary gateway for batch jobs
  • Targets are discovered via service discovery or static configuration
  • Multiple modes of graphing and dashboarding support
  • Support for hierarchical and horizontal federation

Architecture overview

Install

There are various ways of installing Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.

Docker images

Docker images are available on Quay.io or Docker Hub.

You can launch a Prometheus container for trying it out with

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from source code, first ensure that have a working Go environment with version 1.14 or greater installed. You also need Node.js and Yarn installed in order to build the frontend assets.

You can directly use the go tool to download and install the prometheus and promtool binaries into your GOPATH:

$ go get github.com/prometheus/prometheus/cmd/...
$ prometheus --config.file=your_config.yml

However, when using go get to build Prometheus, Prometheus will expect to be able to read its web assets from local filesystem directories under web/ui/static and web/ui/templates. In order for these assets to be found, you will have to run Prometheus from the root of the cloned repository. Note also that these directories do not include the new experimental React UI unless it has been built explicitly using make assets or make build.

An example of the above configuration file can be found here.

You can also clone the repository yourself and build using make build, which will compile in the web assets so that Prometheus can be run from anywhere:

$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/prometheus.git
$ cd prometheus
$ make build
$ ./prometheus --config.file=your_config.yml

The Makefile provides several targets:

  • build: build the prometheus and promtool binaries (includes building and compiling in web assets)
  • test: run the tests
  • test-short: run the short tests
  • format: format the source code
  • vet: check the source code for common errors
  • docker: build a docker container for the current HEAD
  • assets: build the new experimental React UI

React UI Development

For more information on building, running, and developing on the new React-based UI, see the React app's README.md.

More information

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.

Issues
  • Remote storage

    Remote storage

    Prometheus needs to be able to interface with a remote and scalable data store for long-term storage/retrieval.

    kind/enhancement 
    opened by juliusv 170
  • TSDB data import tool for OpenMetrics format.

    TSDB data import tool for OpenMetrics format.

    Created a tool to import data formatted according to the Prometheus exposition format. The tool can be accessed via the TSDB CLI.

    closes prometheus/prometheus#535

    Signed-off-by: Dipack P Panjabi [email protected]

    (Port of https://github.com/prometheus/tsdb/pull/671)

    opened by dipack95 123
  • Add mechanism to perform bulk imports

    Add mechanism to perform bulk imports

    Currently the only way to bulk-import data is a hacky one involving client-side timestamps and scrapes with multiple samples per time series. We should offer an API for bulk import. This relies on https://github.com/prometheus/prometheus/issues/481.

    EDIT: It probably won't be an web-based API in Prometheus, but a command-line tool.

    kind/enhancement priority/P2 component/tsdb 
    opened by juliusv 112
  • Create a section ANNOTATIONS with user-defined payload and generalize RUNBOOK, DESCRIPTION, SUMMARY into fields therein.

    Create a section ANNOTATIONS with user-defined payload and generalize RUNBOOK, DESCRIPTION, SUMMARY into fields therein.

    RUNBOOK was added in a hurry in #843 for an internal demo of one of our users, which didn't give it enough time to be fully discussed. The demo has been done, so we can reconsider this.

    I think we should revert this change, and remove RUNBOOK:

    • Our general policy is that if it can be done with labels, do it with labels
    • All notification methods in the alertmanager will need extra code to deal with this
    • In future, all alertmanager notification templates will need extra code to deal with this
    • In general, all user code touching the alertmanager will need extra code to deal with this
    • This presumes a certain workflow in that you have something called a "runbook" (and not any other name - playbook is also common) and that you have exactly one of them

    Runbooks are not a fundamental aspect of an alert, are not in use by all of our users and thus I don't believe they meet the bar for first-class support within prometheus. This is especially true considering that they don't add anything that isn't already possible with labels.

    opened by brian-brazil 102
  • Implement strategies to limit memory usage.

    Implement strategies to limit memory usage.

    Currently, Prometheus simply limits the chunks in memory to a fixed number.

    However, this number doesn't directly imply the total memory usage as many other things take memory as well.

    Prometheus could measure its own memory consumption and (optionally) evict chunks early if it needs too much memory.

    It's non-trivial to measure "actual" memory consumption in a platform independent way.

    kind/enhancement 
    opened by beorn7 90
  • '@ <timestamp>' modifier

    '@ ' modifier

    This PR implements @ <timestamp> modifier as per this design doc.

    An example query:

    rate(process_cpu_seconds_total[1m]) 
      and
    topk(7, rate(process_cpu_seconds_total[1h] @ 1234))
    

    which ranks based on last 1h rate and w.r.t. unix timestamp 1234 but actually plots the 1m rate.

    Closes #7903

    This PR is to be followed up with an easier way to represent the start, end, range of a query in PromQL so that we could do @ <end>, metric[<range>] easily.

    opened by codesome 86
  • Add option to log slow queries and recording rules

    Add option to log slow queries and recording rules

    opened by AlekSi 86
  • Add offset to selectParams

    Add offset to selectParams

    This adds the Offset from the promql.EvalStmt to the selectParams which is sent to the querier during Select()

    Fixes #4224

    opened by jacksontj 80
  • Port isolation from old TSDB PR

    Port isolation from old TSDB PR

    The original PR was https://github.com/prometheus/tsdb/pull/306 .

    I tried to carefully adjust to the new world order, but please give this a very careful review, especially around iterator reuse (marked with a TODO).

    On the bright side, I definitely found and fixed a bug in txRing.

    prombench 
    opened by beorn7 78
  • 2.3.0 significatnt memory usage increase.

    2.3.0 significatnt memory usage increase.

    Bug Report

    What did you do? Upgraded to 2.3.0

    What did you expect to see? General improvements.

    What did you see instead? Under which circumstances? Memory usage, possibly driven by queries, has considerably increased. Upgrade at 09:27, the memory usage drops on the graph after then are from container restarts due to OOM.

    container_memory_usage_bytes

    image

    Environment

    Prometheus in kubernetes 1.9

    • System information: Standard docker containers, on docker kubelet on linux.

    • Prometheus version: 2.3.0 insert output of prometheus --version here

    kind/bug 
    opened by tcolgate 77
  • Don't sync golangci-lint GitHub Action if `Makefile.common` isn't used

    Don't sync golangci-lint GitHub Action if `Makefile.common` isn't used

    If repositories don't use Makefile.common we shouldn't enforce additional testing systems.

    For example, windows_exporter (see https://github.com/prometheus-community/windows_exporter/pull/833), which doesn't use the Makefile and as a result doesn't run golangci-lint.

    opened by LeviHarrison 0
  • rm overlap, add label builder to fix name bug

    rm overlap, add label builder to fix name bug

    This PR fixes a few problems with the rule backfiller. One problem is the overlapping data from issue #9288. Previously the startWithAlignment could be before start time, which was causing blocks to overlap. This PR adds a fix for that.

    Another problem was that the rule backfiller would error when the data returned from the queryAPI had a __name__ label. This PR now uses the label builder which prevents having duplicate labels and make sure to use the recording rule __name__, the label builder also order the labels in the case where they are not.

    opened by JessicaGreben 0
  • Use npm workspace and integrate codemirror-promql locally

    Use npm workspace and integrate codemirror-promql locally

    This PR is changing the way to setup the react-app and the web module. It is now used the npm workspace. This is done in order to use locally codemirror-promql.

    This PR introduces the usage of craco which wrappes react-script and allows the override of the webpack configuration.

    Note: This PR must be merged after https://github.com/prometheus/prometheus/pull/9284 and after https://github.com/prometheus/prometheus/pull/9322

    opened by Nexucis 1
  • remove app package in codemirror-module

    remove app package in codemirror-module

    Removing the app package in the codemirror-promql module will simplify the module itself and will help to the migration to npm workspace

    opened by Nexucis 0
  • Service discovery provider should not be reinitialized on /-/reload if its config is unchanged

    Service discovery provider should not be reinitialized on /-/reload if its config is unchanged

    What did you do?

    • Run prometheus with k8s_sd enabled. k8s_sd is set to supply many thousands of targets (100k+).
    • Change some unrelated to k8s_sd configuration option and call /-/reload every once in a while

    What did you expect to see? Everything should just work. k8s_sd should not be impacted in any way

    What did you see instead? Under which circumstances? k8s_sd is completely reinitialized, causing targets being deleted and re-added. Due to large number of targets obtained this process is far from instantenious. This causes unnecessary scrape disruptions. image

    • Prometheus version: 2.29.1, built with go1.16.7

    • Logs: Using pod service account via in-cluster config messages correlate with spikes in prometheus_sd_discovered_targets metric

    level=info ts=2021-09-01T16:11:56.267Z caller=main.go:972 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
    level=info ts=2021-09-01T16:11:56.301Z caller=kubernetes.go:282 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
    level=info ts=2021-09-01T16:11:59.861Z caller=main.go:1009 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=3.593115341s db_storage=1.312µs remote_storage=4.347µs web_handler=837ns query_engine=1.758µs scrape=15.933511ms scrape_sd=2.261564ms notify=31.551µs notify_sd=24.808µs rules=3.557461253s
    level=info ts=2021-09-01T16:13:56.260Z caller=main.go:972 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
    level=info ts=2021-09-01T16:13:56.300Z caller=kubernetes.go:282 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
    level=info ts=2021-09-01T16:14:01.791Z caller=main.go:1009 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=5.530782819s db_storage=1.476µs remote_storage=4.773µs web_handler=646ns query_engine=1.611µs scrape=25.965843ms scrape_sd=2.262207ms notify=28.365µs notify_sd=23.784µs rules=5.488814021s
    level=info ts=2021-09-01T16:15:56.260Z caller=main.go:972 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
    level=info ts=2021-09-01T16:15:56.300Z caller=kubernetes.go:282 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
    level=info ts=2021-09-01T16:16:02.829Z caller=main.go:1009 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=6.568657024s db_storage=2.078µs remote_storage=4.146µs web_handler=1.231µs query_engine=1.529µs scrape=23.191811ms scrape_sd=4.595331ms notify=26.713µs notify_sd=21.314µs rules=6.52515695s
    
    opened by krya-kryak 1
  • WIP: Add initial support for exemplar to the remote write receiver endpoint

    WIP: Add initial support for exemplar to the remote write receiver endpoint

    This is the initial implementation of #9317.

    This PR is currently prefix with WIP: I wanted to get early feedback on the implementation since this is my first contribution to this project. Thank you in advance.

    NOTE: I will add tests after receiving feedbacks on the implementation.

    opened by secat 0
  • Add exemplars support on the remote write receiver endpoint

    Add exemplars support on the remote write receiver endpoint

    Proposal

    The Prometheus remote write receiver endpoint currently does not append exemplars. This issue proposes to add exemplars support on the remote write receiver.

    According to @csmarchbanks, it should be fairly easy to add a loop for exemplars like the one done for samples here: https://github.com/prometheus/prometheus/blob/0111aa987e600c861dff34e4d24e891ed63d0535/storage/remote/write_handler.go#L77

    It should also take into consideration that the exemplar-storage feature is enabled.

    opened by secat 2
  • Prometheus two replica pod(k8s) with duplication metrics data.

    Prometheus two replica pod(k8s) with duplication metrics data.

    Prometheus two replica pod with duplication metrics data. Currently we have two prometheus pod on K8S cluster and collecting metrics to VM servers. it is working fine. We are thing now two pod going and collecting metrics to VM or only one prometheus pod collecting metrics . We want know. If two prometheus pod(replica 2) run on the cluster. Will it happen data duplication metrics? can any one help please.

    opened by babujii 0
  • tests: Move from t.Errorf and others. (Part 3)

    tests: Move from t.Errorf and others. (Part 3)

    The third part of #8063.

    opened by Creatone 0
  • Discovery for ECS

    Discovery for ECS

    In an earlier evaluation, ECS discovery was rejected due to API rate limiting issues described at the discovery section. As of today, there are ECS users that are publishing Prometheus metrics and using CloudWatch Agent's Prometheus scraping capabilities. They configure the agent with task selection mechanism to shard the load among multiple clusters. Influenced by what the users already do, we think we can tackle the problem in a couple of ways:

    • Asking users to configure the discovery to discover a set of matching tasks from a cluster, cache metadata in memory where possible.
    • Querying the initial data with the ECS API and then relying on ECS events to identify new and terminated tasks.
    • Asking users to run Prometheus as a sidecar in their ECS tasks as a last resort.

    Given we have this functionality in the CW Agent, not having a similar capability in Prometheus is confusing the ECS users. We would like to fill this gap by contributing an ECS discovery agent to Prometheus and want to switch to the discovery mechanism provided here in all our other collection agents (CW Agent, OpenTelemetry Prometheus Receiver, etc)

    Goals

    • Discovery will only discover metric endpoints from a single cluster.
    • We will allow users to filter the tasks by the Cluster Query language and ECS tags.
    • Users should be able to specify ports and metrics path where the Prometheus metrics are published from the task. (See the config for more.)
    • ECS discovery will support both ECS on EC2 and ECS on Fargate.

    Config

    Once implemented, ECS discovery will be supported in the Prometheus config. The example below will query the cluster to discover ECS tasks/containers matching the given task selectors.

    scrape_configs:
      - job_name: ecs-job
        [ metrics_path: <string> ]
        ecs_sd_configs:
          - [ refresh_interval: <string> | default = 720s ]
            [ region: <string> ]
            cluster: <string>
            [ access_key: <string> ] 
            [ secret_key: <secret> ]
            [ profile: <string> ]
            [ role_arn: <string> ]
            ports:
                - <int>
            task_selectors:
              - [ service: <string> ]
                [ family: <string> ]
                [ revisions: <int> ]
                [ launch_type: <string> ]
                [ query: <string> ]
                [ tags: 
                   - <string>:  <string> ]
    

    Discovery

    Discovery is done by periodically pulling the ListTasks API. Discovery will only return the ACTIVE tasks.

    As an improvement, we will switch to a model where we will listen to ECS events to be notified about the task start and terminations in the future. This will allow us to call the ListTasks for once and rely on the events for the changes as an optimization.

    Labels

    Prometheus discovery can automatically add ECS task/container labels to the scraped metrics. The discovery will add the following labels:

    Label | Source | Type | Description -- | -- | -- | -- __meta_ecs_cluster | ECS Cluster | string | ECS cluster name. __meta_ecs_task_launch_type | ECS Task | string | "ec2" or "fargate". __meta_ecs_task_family | ECS Task | string | ECS task family. __meta_ecs_task_family_revision | ECS Task | string | ECS task family revision. __meta_ecs_task_container | ECS Task | string | Name of the container. __meta_ecs_task_network_interface | ECS Task | string | Network interface name, e.g. "eth1". __meta_ecs_task_az | ECS Task | string | Availability zone __meta_ecs_ec2_instance_id | EC2 | string | EC2 instance id for EC2 launch type. Otherwise "fargate".

    Authentication & IAM

    We will use the default credential provider chain, the following permissions are required:

    • ec2:DescribeInstances
    • ecs:ListTasks
    • ecs:DescribeContainerInstances
    • ecs:DescribeTasks
    component/service discovery priority/P3 kind/feature 
    opened by rakyll 6
Releases(v2.30.0-rc.0)
🤔 A minimize Time Series Database, written from scratch as a learning project.

mandodb ?? A minimize Time Series Database, written from scratch as a learning project. 时序数据库(TSDB: Time Series Database)大多数时候都是为了满足监控场景的需求,这里先介绍两个概念:

dongdong 361 Sep 15, 2021
ACID key-value database.

Coffer Simply ACID* key-value database. At the medium or even low latency it tries to provide greater throughput without losing the ACID properties of

Eduard 27 Aug 31, 2021
An embedded key/value database for Go.

bbolt bbolt is a fork of Ben Johnson's Bolt key/value store. The purpose of this fork is to provide the Go community with an active maintenance and de

etcd-io 4.8k Sep 6, 2021
A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

go-mysql-server is a SQL engine which parses standard SQL (based on MySQL syntax) and executes queries on data sources of your choice. A simple in-memory database and table implementation are provided, and you can query any data source you want by implementing a few interfaces.

DoltHub 610 Sep 4, 2021
immudb - world’s fastest immutable database

immudb Note: The master branch is the joint point for all ongoing development efforts. Thus it may be in an unstable state and should not be used in p

CodeNotary 3.1k Sep 15, 2021
El Carro: The Oracle Operator for Kubernetes

El Carro is a new project that offers a way to run Oracle databases in Kubernetes as a portable, open source, community driven, no vendor lock-in container orchestration system. El Carro provides a powerful declarative API for comprehensive and consistent configuration and deployment as well as for real-time operations and monitoring.

Google Cloud Platform 172 Sep 13, 2021
LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability.

LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability. LinDB stores all monitoring data of ELEME Inc, there is 88TB incremental writes per day and 2.7PB total raw data.

LinDB 2k Sep 14, 2021
Graviton Database: ZFS for key-value stores.

Graviton Database: ZFS for key-value stores. Graviton Database is simple, fast, versioned, authenticated, embeddable key-value store database in pure

null 404 Aug 12, 2021
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.

NutsDB English | 简体中文 NutsDB is a simple, fast, embeddable and persistent key/value store written in pure Go. It supports fully serializable transacti

徐佳军 1.7k Sep 6, 2021
Time Series Database based on Cassandra with Prometheus remote read/write support

SquirrelDB SquirrelDB is a scalable high-available timeseries database (TSDB) compatible with Prometheus remote storage. SquirrelDB store data in Cass

Bleemeo 11 Sep 2, 2021
Scalable datastore for metrics, events, and real-time analytics

InfluxDB InfluxDB is an open source time series platform. This includes APIs for storing and querying data, processing it in the background for ETL or

InfluxData 22.1k Sep 14, 2021
Scalable datastore for metrics, events, and real-time analytics

InfluxDB InfluxDB is an open source time series platform. This includes APIs for storing and querying data, processing it in the background for ETL or

InfluxData 22.1k Sep 11, 2021
Distributed cache and in-memory key/value data store.

Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.

Burak Sezer 1.9k Sep 6, 2021
Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.

unitdb Unitdb is blazing fast specialized time-series database for microservices, IoT, and realtime internet connected devices. As Unitdb satisfy the

Saffat Technologies 72 Sep 7, 2021
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support

BuntDB is a low-level, in-memory, key/value store in pure Go. It persists to disk, is ACID compliant, and uses locking for multiple readers and a sing

Josh Baker 3.4k Sep 15, 2021
pure golang key database support key have value. 非常高效实用的键值数据库。

orderfile32 pure golang key database support key have value The orderfile32 is standard alone fast key value database. It have two version. one is thi

null 5 Sep 12, 2021
Fast key-value DB in Go.

BadgerDB BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast,

Dgraph 9.7k Sep 13, 2021
A decentralized, trusted, high performance, SQL database with blockchain features

中文简介 CovenantSQL(CQL) is a Byzantine Fault Tolerant relational database built on SQLite: ServerLess: Free, High Availabile, Auto Sync Database Service

CovenantSQL 1.2k Sep 6, 2021
The lightweight, distributed relational database built on SQLite.

rqlite is a lightweight, distributed relational database, which uses SQLite as its storage engine. Forming a cluster is very straightforward, it grace

rqlite 8.8k Sep 10, 2021