Doraemon is a Prometheus based monitor system

Overview

English | 中文

Doraemon

License

Doraemon is a Prometheus based monitor system ,which are made up of three components——the Rule Engine,the Alert Gateway and the Web-UI.Instead of configuring alarm rules in config file,this system can configure alarm rules dynamically through the Web-UI and integrates many customized alarm functions.

Features

  • Users can configure alarm rules dynamically through the Web-UI.
  • Support flexible alarm strategies such as alarm delays through which can realize the alarm upgrade strategies,alarm groups and duty groups.Users can handle the alarms in their own way by sending the alarms to hooks.
  • Users can confirm the alarms by prometheus tags.
  • Support the maintain groups.
  • In order to reduce the number of alarms,all of which are aggregated by rules.The alarms are aggregated once per cycle and the alarm recovery information are aggregated every minute.
  • LDAP/OAuth 2.0/DB Multiple login mode support for Enterprise Edition.

Architecture

The whole system adopts the separation of front and back ends, in which the front end uses React for data interaction and display.The backend uses the Beego framework for data interface processing and data for MySQL storage.

Architecture

Component

  • Rule Engine:Pull rules from Alert Gateway,and then send the rules to prometheus server to caculate and push the alerts to Alert Gateway.
  • Alert Gateway:Aggregate the alarms and send them to alarm receivers according to their alarm strategies.
  • Web UI:For adding rules,alarm strategies and maintain groups.To confirm alarms and view historical alarm records.

Dependence

Quickly Start

  • Clone

    $ git clone https://github.com/Qihoo360/doraemon.git
  • Modify the Configuration File 1.Replace the "localhost" in deployments/docker-compose/conf/config.js with the local physical network card IP. 2.Replace the "localhost" of WebUrl in deployments/docker-compose/conf/app.conf with the local physical network card IP.

  • Start Doraemon

    Start server by docker-compose at Doraemon project.

    $ cd deployments/docker-compose/
    $ docker-compose up -d

    With the above command, you can access the Doraemon from http://hostip:32000. The default username is "admin",and the password is "123456".

Instructions

Wiki

Contributor

Comments
  • cors问题

    cors问题

    src: 192.168.7.x dst: 192.168.14.249 err: Access to XMLHttpRequest at 'http://192.168.14.249:8080/api/v1/login/username' from origin 'http://192.168.14.249:32000' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

    bug 
    opened by skycgz 9
  • k8s部署ingress访问入口

    k8s部署ingress访问入口

    我是k8s部署,我想使用Clusterip的service,通过ingress来访问。我把doraemon.yml改了几处地方 1、WebUrl = "http://doraemon.***.cn"

    2、window.CONFIG = { baseURL: 'http://doraemon.***.cn', };

    3、apiVersion: v1 kind: Service metadata: labels: app: doraemon-web name: doraemon-web namespace: monitoring spec: ports: - protocol: TCP port: 8080 targetPort: 80 selector: app: doraemon-web

    部署完毕,域名访问提示“”没有返回数据” alertgateway容器日志报错: 2020/07/08 18:37:14.829 [C] [panic.go:522] Handler crashed with error runtime error: invalid memory address or nil pointer dereference

    请问是我哪里配置错误,还是现在只支持Nodeport模式的service访问

    opened by Ethan30k 7
  • alert-gateway容器60秒重启一次

    alert-gateway容器60秒重启一次

    你好,运行方式为docker-compose ,之前运行好好的,最近两天发现alert-gateway容器60秒重启一次,如果不加入规则运行是正常的。 这种情况遇到过吗? 重启的时候报错: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x969ab0]

    image

    image

    opened by 231bobo 5
  • Insert alter failed数组越界

    Insert alter failed数组越界

    当我配置rabbitmq告警后,会出现如下数据越界的报错,其他中间件的告警规则看起来就不会出现,请问下要怎么解决 2020/09/03 10:12:37.994 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:37.998 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.001 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.004 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.008 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.012 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.015 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.019 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.022 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.026 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.029 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.033 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.036 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.039 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.043 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.046 [E] [alerts.go:667] Insert alter failed:Error 1062: Duplicate entry '74-cluster[email protected] durablefa' for key 'ruleid_labels_firedat' 2020/09/03 10:12:38.047 [D] [server.go:2774] | 127.0.0.1| 200 | 305.073647ms| match| POST /api/v1/alerts r:/api/v1/alerts/ panic: runtime error: index out of range goroutine 31 [running]: github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.Filter(0xc00038bd98, 0xc00038bd68, 0xc00017c120, 0xc0001ee070) /go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:353 +0x31d6 github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.InitTimer.func5.2(0xc000258000, 0x13, 0xc00017c120) /go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:792 +0x518 created by github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.InitTimer.func5 /go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:770 +0xd8

    opened by echophlin 3
  • 告警异常

    告警异常

    在使用过程中发现两个问题,麻烦看下是代码bug,还是我使用过程中配置有误

    1. 采用HOOK邮件方式告警, 当告警项已恢复正常值,web界面也已显示告警已恢复,在prometheus web界面查询此时也是正常值,但是还是会不间断的收到告警邮件,告警信息中value值为第一次触发告警时的值。

    2. HOOK方式获取的告警发生时间和web界面显示的告警发生时间不同

    告警规则和告警计划配置见下图 1 2

    documentation 
    opened by Chris0408 3
  • 当我验证一条规则的时候,遇到了如下错误

    当我验证一条规则的时候,遇到了如下错误

    为了验证告警流程,我创建了一条规则,监控主机的node是否up,当我把node_exporter停掉之后,在告警历史中看不到任何记录,从gateway日志当中看到了如下错误:

    2020/05/25 15:35:20.049 [I] [controller.go:218]  [{2020-05-25 15:27:35.044897044 +0800 CST {主机exporter无响应 主机exporter无响应 871} 2020-05-25 15:27:50.044897044 +0800 CST map[instance:192.168.0.2:9100 job:ops-eryajf-test-1] 2020-05-25 15:35:20.044897044 +0800 CST 0001-01-01 00:00:00 +0000 UTC 2 2020-05-25 15:38:20.044897044 +0800 CST 0}]
    
    2020/05/25 15:35:20.050 [E] [alerts.go:566]  Insert alter failed:Error 1292: Incorrect datetime value: '0000-00-00' for column 'confirmed_at' at row 1
    2020/05/25 15:35:20.050 [D] [server.go:2774]  |     172.19.0.4| 200 |   1.029839ms|   match| POST     /api/v1/alerts   r:/api/v1/alerts/
    2020/05/25 15:35:30.322 [E] [panic.go:522]  Panic in UpdateMaintainlist:runtime error: invalid memory address or nil pointer dereference
    goroutine 11 [running]:
    github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.UpdateMaintainlist.func1()
    	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:46 +0xb5
    panic(0xa10f00, 0xffd010)
    	/usr/local/go/src/runtime/panic.go:522 +0x1b5
    github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.UpdateMaintainlist()
    	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:69 +0x9c1
    github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.init.1.func1()
    	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:399 +0x64
    created by github.com/Qihoo360/doraemon/cmd/alert-gateway/initial.init.1
    	/go/src/github.com/Qihoo360/doraemon/cmd/alert-gateway/initial/timer.go:395 +0x35
    

    我导入的是文档中提供的sql。

    opened by eryajf 3
  • improvements

    improvements

    There is no change for function, just some improvements for code readability.

    1. add IsValid & IsOnDuty method for UserGroup.
    2. extract record struct for alerts handlers.
    opened by SenCoder 2
  • no LDAP authentication method

    no LDAP authentication method

    Hi there, I'm testing this project in local environment. After manually enabling ldap auth on frontend app(by editing(doraemon/web/app/page/base/app/login.js, changing chooseMethod from 'local' to 'ldap'), the backend log showing ' nomatch| POST /api/v1/login/ldap'. Then I dig into the backend code in file named 'cmd/alert-gateway/controllers/login.go', it turns out that there was not // @router /ldap [post] and ldap authentication code there.

    So LDAP authentication is in both your roadmap, config file and documents, but just not implemented yet. I'm I right?

    Thanks for your works, good idea and its helpful by the way.

    enhancement help wanted 
    opened by OpenAndrus 2
  • 发送告警的时候,alert-gateway报错了

    发送告警的时候,alert-gateway报错了

    2020/07/24 11:48:22.738 [E] [alerts.go:482] Insert alter failed:Error 1292: Incorrect datetime value: '0000-00-00' for column 'confirmed_at' at row 1

    opened by xing-shadow 1
  • 无法编译

    无法编译

    go: finding google.golang.org/appengine v1.6.5 go: finding github.com/go-kit/kit v0.9.0 go: finding gopkg.in/ldap.v2 v2.5.1 go: finding github.com/pkg/errors v0.8.1 go: finding github.com/prometheus/client_golang v1.2.0 go: finding github.com/shiena/ansicolor v0.0.0-20151119151921-a422bbe96644 go: finding github.com/astaxie/beego v1.12.1 go: finding gopkg.in/asn1-ber.v1 v1.0.0-20181015200546-f715ec2f112d go: finding github.com/go-ldap/ldap v3.0.3+incompatible go: finding github.com/prometheus/common v0.7.0

    opened by a3625311 4
Owner
Qihoo 360
360 official github
Qihoo 360
Monitor - API endpoints for system monitoring

System monitor Golang API for accessing system stats, linux-only $ curl -s http:

Evin Dunn 0 Jan 12, 2022
Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe energy related system stats and exports as Prometheus metrics

kepler Kepler (Kubernetes Efficient Power Level Exporter) uses eBPF to probe energy related system stats and exports as Prometheus metrics Architectur

Sustainable Computing 156 Sep 27, 2022
Export Prometheus metrics from journald events using Prometheus Go client library

journald parser and Prometheus exporter Export Prometheus metrics from journald events using Prometheus Go client library. For demonstration purposes,

Mike Sgarbossa 0 Jan 3, 2022
Monitor your Website and APIs from your Computer. Get Notified through Slack, E-mail when your server is down or response time is more than expected.

StatusOK Monitor your Website and APIs from your computer.Get notified through Slack or E-mail when your server is down or response time is more than

Sanath Kumar 1.6k Sep 21, 2022
🏯 Monitor your (gitlab/github) CI/CD pipelines via command line interface with fortress

__ _ / _| | | | |_ ___ _ __| |_ _ __ ___ ___ ___ | _/ _ \| '__| __| '__/ _ \/ __/ _

MrJosh 6 Mar 31, 2022
gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods.

gpu-memory-monitor is a metrics server for collecting GPU memory usage of kubernetes pods. If you have a GPU machine, and some pods are using the GPU device, you can run the container by docker or kubernetes when your GPU device belongs to nvidia. The gpu-memory-monitor will collect the GPU memory usage of pods, you can get those metrics by API of gpu-memory-monitor

null 2 Jul 27, 2022
A docker container that can be deployed as a sidecar on any kubernetes pod to monitor PSI metrics

CgroupV2 PSI Sidecar CgroupV2 PSI Sidecar can be deployed on any kubernetes pod with access to cgroupv2 PSI metrics. About This is a docker container

null 1 Nov 23, 2021
Latest block exporter to monitor your own nodes !

Ethereum Block Prometheus Exporter Deeply copied from 31z4/ethereum-prometheus-exporter Thanks a lot for his work ! This service exports the latest bl

iderr 0 Nov 5, 2021
Self-hosted uptime monitor

minute Self-hosted uptime monitor. Usage $ go build $ ./minute sites.txt Configuration Configuration is done through the sites.txt file. <SMTP server

Theodore Keloglou 3 Aug 18, 2022
Utilities to monitor Conflux blockchain data.

conflux-monitor Utilities to monitor Conflux blockchain data. TODO Statistic pivot chain switch against latest_mined and latest_state. Statistic trans

Conflux 0 Dec 31, 2021
Small monitor of pulseaudio volume etc. for use in xmobar, as CommandReader input

Simple PulseAudio volume monitor for xmobar This little monitor is my attempt to read the current volume and mute setting of the default sink from Pul

Özgür Kesim 1 Feb 16, 2022
Translate Prometheus Alerts into Kubernetes pod readiness

prometheus-alert-readiness Translates firing Prometheus alerts into a Kubernetes readiness path. Why? By running this container in a singleton deploym

Coralogix 19 Mar 7, 2021
A beginner friendly introduction to prometheus 🔥

Prometheus-Basics A beginner friendly introduction to prometheus. Table of Contents What is prometheus ? What are metrics and why is it important ? Ba

S Santhosh Nagaraj 1.6k Sep 21, 2022
A set of tests to check compliance with the Prometheus Remote Write specification

Prometheus Remote Write Compliance Test This repo contains a set of tests to check compliance with the Prometheus Remote Write specification. The test

Tom Wilkie 99 Sep 11, 2022
Automating Kubernetes Rollouts with Argo and Prometheus. Checkout the demo URL below

observe-argo-rollout Demo for Automating and Monitoring Kubernetes Rollouts with Argo and Prometheus Performing Demo The demo can be found on Katacoda

null 32 Sep 2, 2022
📡 Prometheus exporter that exposes metrics from SpaceX Starlink Dish

Starlink Prometheus Exporter A Starlink exporter for Prometheus. Not affiliated with or acting on behalf of Starlink(™) ?? Starlink Monitoring System

DanOpsTech 77 Sep 19, 2022
A tool to dump and restore Prometheus data blocks.

promdump promdump dumps the head and persistent blocks of Prometheus. It supports filtering the persistent blocks by time range. Why This Tool When de

Ivan Sim 109 Sep 15, 2022
🦥 Easy and simple Prometheus SLO generator

Sloth Introduction Use the easiest way to generate SLOs for Prometheus. Sloth generates understandable, uniform and reliable Prometheus SLOs for any k

Xabier Larrakoetxea Gallego 1.3k Sep 27, 2022
Prometheus rule linter

pint pint is a Prometheus rule linter. Usage There are two modes it works in: CI PR linting Ad-hoc linting of a selected files or directories Pull Req

Cloudflare 425 Sep 28, 2022