💡 A Distributed and High-Performance Monitoring System. The next generation of Open-Falcon

Overview

Nightingale


夜莺简介

夜莺是一套分布式高可用的运维监控系统,最大的特点是混合云支持,既可以支持传统物理机虚拟机的场景,也可以支持K8S容器的场景。同时,夜莺也不只是监控,还有一部分CMDB的能力、自动化运维的能力,很多公司都基于夜莺开发自己公司的运维平台。开源的这部分功能模块也是商业版本的一部分,所以可靠性有保障、会持续维护,诸君可放心使用。效果图如下:

Nightingale

OCE认证

OCE是一个认证机制和交流平台,为夜莺生产用户量身打造,我们会为OCE企业提供更好的技术支持,比如专属的技术沙龙、企业一对一的交流机会、专属的答疑群等,如果贵司已将夜莺上了生产,快来加入吧

文档资料

交流互助

关注公众号 Obsuite(官方公众号) 回复 "夜莺加群"

Nightingale

Comments
  • oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊

    oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊

    Relevant server.conf | webapi.conf

    [OIDC]
    Enable = true
    RedirectURL = "http://ip:18000/callback"
    SsoAddr = "http://ip/oidc/login"
    ClientId = "xxxxxxxxxxxxxx"
    ClientSecret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    CoverAttributes = true
    # 默认角色
    DefaultRoles = ["Standard"]
    
    # 属性映射
    [OIDC.Attributes]
    Nickname = "nickname"
    Phone = "phone_number"
    Email = "email"
    

    Relevant logs

    oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊
    

    System info

    oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊

    Steps to reproduce

    1.oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊 2. 3. ...

    Expected behavior

    oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊

    Actual behavior

    oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊

    Additional info

    oidc配置了,没有显示oidc的登陆信息,这是怎么回事啊

    opened by 601579263 20
  • 求解:各位大佬老师,在用ibex categraf-v0.2.10 做自愈脚本的时候,告警规则上设置自愈脚本不生效, 我categraf  和ibex-agent 装在一台服务器。然后夜莺的一套是用docker 装在了另外一台服务器上,求排查思路。

    求解:各位大佬老师,在用ibex categraf-v0.2.10 做自愈脚本的时候,告警规则上设置自愈脚本不生效, 我categraf 和ibex-agent 装在一台服务器。然后夜莺的一套是用docker 装在了另外一台服务器上,求排查思路。

    Relevant server.conf | webapi.conf

    cat /data/categraf-v0.2.10-linux-amd64/conf/config.toml
    [global]
    # whether print configs
    print_configs = false
    
    # add label(agent_hostname) to series
    # "" -> auto detect hostname
    # "xx" -> use specified string xx
    # "$hostname" -> auto detect hostname
    # "$ip" -> auto detect ip
    # "$hostname-$ip" -> auto detect hostname and ip to replace the vars
    hostname = "$ip"
    
    
    /ibex/etc/agentd.conf 配置
    # debug, release
    RunMode = "debug"
    
    # task meta storage dir
    MetaDir = "./meta"
    
    [HTTP]
    Enable = true
    # http listening address
    Host = "0.0.0.0"
    # http listening port
    Port = 2090
    # https cert file path
    CertFile = ""
    # https key file path
    KeyFile = ""
    # whether print access log
    PrintAccessLog = true
    # whether enable pprof
    PProf = false
    # http graceful shutdown timeout, unit: s
    ShutdownTimeout = 30
    # max content length: 64M
    MaxContentLength = 67108864
    # http server read timeout, unit: s
    ReadTimeout = 20
    # http server write timeout, unit: s
    WriteTimeout = 40
    # http server idle timeout, unit: s
    IdleTimeout = 120
    
    [Heartbeat]
    # unit: ms
    Interval = 1000
    # rpc servers
    Servers = ["172.18.89.20:20090"]
    # $ip or $hostname or specified string
    Host = "172.18.89.18"
    

    Relevant logs

    none
    

    System info

    前端版本:5.9.0 后端版本:v5.10.3-23d7e5a7de5d0ffea7ab941e9621ef7f53071775

    Steps to reproduce

    ...

    Expected behavior

    告警规则报警的时候触发不了自愈脚本。

    Actual behavior

    none

    Additional info

    none

    opened by beginend2012 19
  • 3.0页面显示不正常

    3.0页面显示不正常

    根据安装步骤进行安装,估计是前端界面存在问题。 console报错如下: SyntaxError: Unexpected end of JSON input at JSON.parse () at layout-2ad14e8510d33d9e5c84.js:81829 at ia (layout-2ad14e8510d33d9e5c84.js:59318) at La (layout-2ad14e8510d33d9e5c84.js:59318) at Va (layout-2ad14e8510d33d9e5c84.js:59318) at Ha (layout-2ad14e8510d33d9e5c84.js:59318) at Mc (layout-2ad14e8510d33d9e5c84.js:59318) at xc (layout-2ad14e8510d33d9e5c84.js:59318) at yc (layout-2ad14e8510d33d9e5c84.js:59318) at Ga (layout-2ad14e8510d33d9e5c84.js:59318) layout-2ad14e8510d33d9e5c84.js:81829 SyntaxError: Unexpected end of JSON input at JSON.parse () at layout-2ad14e8510d33d9e5c84.js:81829 at ia (layout-2ad14e8510d33d9e5c84.js:59318) at La (layout-2ad14e8510d33d9e5c84.js:59318) at Va (layout-2ad14e8510d33d9e5c84.js:59318) at Ha (layout-2ad14e8510d33d9e5c84.js:59318) at Mc (layout-2ad14e8510d33d9e5c84.js:59318) at xc (layout-2ad14e8510d33d9e5c84.js:59318) at yc (layout-2ad14e8510d33d9e5c84.js:59318) at Ga (layout-2ad14e8510d33d9e5c84.js:59318)

    opened by lihuiheng 17
  • 告警配置的与条件 似乎有问题

    告警配置的与条件 似乎有问题

    当前的告警策略配置: image

    当前的对应 matrics 的状态: image

    想实现的是 当proc.port.listen <1 与 file.lock.exist < 1都满足的时候 就报警
    当前的状态是满足告警的条件的。 但是并没有触发告警 。

    单独配置 不使用 与 条件的时候,是能够分别触发告警的。

    opened by AlliotTech 15
  • 夜莺v3版本 agent 停止时无法生产报警事件

    夜莺v3版本 agent 停止时无法生产报警事件

    我将夜莺从v2版本重新部署成v3版本时其他报警可以正常的生成报警信息,agent 停止时无法生成告警事件

    监控策略如下

    { "name": "监控agent失联", "category": 1, "alert_dur": 60, "recovery_dur": 0, "recovery_notify": 1, "enable_stime": "00:00", "enable_etime": "23:59", "priority": 1, "exprs": [ { "eopt": "=", "func": "nodata", "metric": "proc.agent.alive", "params": [], "threshold": 0 } ], "tags": [], "enable_days_of_week": [ 0, 1, 2, 3, 4, 5, 6 ], "converge": [ 36000, 1 ], "endpoints": null },

    其余报警都正常,并且我的监控策略都是放置在一个主节点的 当agent停止时可以可以从监控看图正常的看到proc.agent.alive 监控项没有上报获取到信息 在未恢复报警中没有生成事件为啥 求大佬指点

    opened by linux-david 14
  • 通过api上报的数据查询结果为NaN

    通过api上报的数据查询结果为NaN

    Relevant server.conf | webapi.conf

    tsdb.yml
    rrd:
      storage: /home/storage/n9e_data/8011
    cache:
      keepMinutes: 120
    logger:
      dir: logs/tsdb
      level: WARNING
      keepHours: 2
    
    transfer.yml
    backend:
      datasource: "tsdb"
      m3db:
        enabled: false
        maxSeriesPoints: 720                       # default 720
        name: "m3db"
        namespace: "default"
        seriesLimit: 0
        docsLimit: 0
        daysLimit: 7                               # max query time
        # https://m3db.github.io/m3/m3db/architecture/consistencylevels/
        writeConsistencyLevel: "majority"          # one|majority|all
        readConsistencyLevel: "unstrict_majority"  # one|unstrict_majority|majority|all
        config:
          service:
            # KV environment, zone, and service from which to write/read KV data (placement
            # and configuration). Leave these as the default values unless you know what
            # you're doing.
            env: default_env
            zone: embedded
            service: m3db
            etcdClusters:
              - zone: embedded
                endpoints:
                  - 127.0.0.1:2379
                tls:
                  caCrtPath: /etc/etcd/certs/ca.pem
                  crtPath: /etc/etcd/certs/etcd-client.pem
                  keyPath: /etc/etcd/certs/etcd-client-key.pem
      tsdb:
        enabled: true
        name: "tsdb"
        cluster:
          tsdb01: 127.0.0.1:8011
      influxdb:
        enabled: false
        username: "influx"
        password: "admin123"
        precision: "s"
        database: "n9e"
        address: "http://127.0.0.1:8086"
      opentsdb:
        enabled: false
        address: "127.0.0.1:4242"
      kafka:
        enabled: false
        brokersPeers: "192.168.1.1:9092,192.168.1.2:9092"
        topic: "n9e"
    logger:
      dir: logs/transfer
      level: INFO
      keepHours: 24
    

    Relevant logs

    2022-07-06 18:39:47.942693 WARNING rpc/query.go:118 debug: true, /home/storage/n9e_data/8011/cd/cdf3bd9e6ba35f20b66aa65ddd330365_GAUGE_7200.rrd
    2022-07-06 18:39:47.943026 WARNING rpc/query.go:145 data: [<RRDData:Value:NaN TS:1654480800 2022-06-06 10:00:00> <RRDData:Value:NaN TS:1654488000 2022-06-06 12:00:00> <RRDData:Value:NaN TS:1654495200 2022-06-06 14:00:00> <RRDData:Value:NaN TS:1654502400 2022-06-06 16:00:00> <RRDData:Value:NaN TS:1654509600 2022-06-06 18:00:00> <RRDData:Value:NaN TS:1654516800 2022-06-06 20:00:00> <RRDData:Value:NaN TS:1654524000 2022-06-06 22:00:00> <RRDData:Value:NaN TS:1654531200 2022-06-07 00:00:00> <RRDData:Value:NaN TS:1654538400 2022-06-07 02:00:00> <RRDData:Value:NaN TS:1654545600 2022-06-07 04:00:00> <RRDData:Value:NaN TS:1654552800 2022-06-07 06:00:00> <RRDData:Value:NaN TS:1654560000 2022-06-07 08:00:00> <RRDData:Value:NaN TS:1654567200 2022-06-07 10:00:00> <RRDData:Value:NaN TS:1654574400 2022-06-07 12:00:00> <RRDData:Value:NaN TS:1654581600 2022-06-07 14:00:00> <RRDData:Value:NaN TS:1654588800 2022-06-07 16:00:00> <RRDData:Value:NaN TS:1654596000 2022-06-07 18:00:00> <RRDData:Value:NaN TS:1654603200 2022-06-07 20:00:00> <RRDData:Value:NaN TS:1654610400 2022-06-07 22:00:00> <RRDData:Value:NaN TS:1654617600 2022-06-08 00:00:00>
    

    System info

    n9e 3.8.0

    Steps to reproduce

    1.首先通过/api/transfer/data批量上报以前收集的历史数据 2.在监控页面查看数据图表时发现数据无法查看 3.排查日志发现数据文件可以正常打开,但是查出来的数据部分的Value是NaN

    Expected behavior

    可以正常查询数据,可以显示数据图表

    Actual behavior

    无法显示数据图表,数据查询的结果如日志显示的是Value为NaN

    Additional info

    No response

    opened by rhizoma-atractylodis 11
  • 活跃告警聚合规则使用问题

    活跃告警聚合规则使用问题

    前端版本:5.5.1 后端版本:5.9.3

    第一个问题 image image 这时他会提示是否公开,逻辑上公开就打开,不公开就关上,然后我公开再不公开就正常创建了 image 建议优化下这个逻辑

    第二个问题

    image image 我的__name__聚合规则添加上了,但是实际显示出来的是Null image 我编辑聚合规则,删除__name__ 标签,提示:unsupported field: name,这个很奇怪,有时候可以添加上标签,有时候又不可以。 我试了一下告警中其他的标签,也是同样的问题。

    opened by FengZh61 11
  • 【多集群配置】配置好webapic.conf和server.conf后,只有节点信息(ident)到了中心端,prometheus的即时数据无法查询。

    【多集群配置】配置好webapic.conf和server.conf后,只有节点信息(ident)到了中心端,prometheus的即时数据无法查询。

    Relevant server.conf | webapi.conf

    中心端:webapi.conf
    
    # 中心端cluster info
    [[Clusters]]
    # Prometheus cluster name
    Name = "Default"
    # Prometheus APIs base url
    Prom = "http://127.0.0.1:9090"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Timeout = 30000
    DialTimeout = 3000
    MaxIdleConnsPerHost = 100
    
    # 局部地区cluster info
    [[Clusters]]
    # Prometheus cluster name
    Name = "zhifawang_cluster"
    # Prometheus APIs base url
    Prom = "http://局部地区ip:9090"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Timeout = 30000
    DialTimeout = 3000
    MaxIdleConnsPerHost = 100
    
    分地区server.conf
    [DB]
    # postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
    DSN="root:[email protected](中心端mysql-ip:3306)/n9e_v5?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"
    # enable debug mode or not
    Debug = false
    # mysql postgres
    DBType = "mysql"
    # unit: s
    MaxLifetime = 7200
    # max open connections
    MaxOpenConns = 150
    # max idle connections
    MaxIdleConns = 50
    # table prefix
    TablePrefix = ""
    # enable auto migrate or not
    EnableAutoMigrate = false
    
    #中心端-server.conf
    [Reader]
    # prometheus base url
    Url = "http://127.0.0.1:9090"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Timeout = 30000
    DialTimeout = 10000
    TLSHandshakeTimeout = 30000
    ExpectContinueTimeout = 1000
    IdleConnTimeout = 90000
    # time duration, unit: ms
    KeepAlive = 30000
    MaxConnsPerHost = 0
    MaxIdleConns = 100
    MaxIdleConnsPerHost = 10
    
    [[Writers]]
    Url = "http://127.0.0.1:9090/api/v1/write"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Timeout = 10000
    DialTimeout = 3000
    TLSHandshakeTimeout = 30000
    ExpectContinueTimeout = 1000
    IdleConnTimeout = 90000
    # time duration, unit: ms
    KeepAlive = 30000
    MaxConnsPerHost = 0
    MaxIdleConns = 100
    MaxIdleConnsPerHost = 100
    
    # 局部地区集群server.conf
    [Reader]
    # prometheus base url
    Url = "http://prometheus:9090"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Timeout = 30000
    DialTimeout = 10000
    TLSHandshakeTimeout = 30000
    ExpectContinueTimeout = 1000
    IdleConnTimeout = 90000
    # time duration, unit: ms
    KeepAlive = 30000
    MaxConnsPerHost = 0
    MaxIdleConns = 100
    MaxIdleConnsPerHost = 10
    
    [[Writers]]
    Url = "http://prometheus:9090/api/v1/write"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Timeout = 30000
    DialTimeout = 10000
    TLSHandshakeTimeout = 30000
    ExpectContinueTimeout = 1000
    IdleConnTimeout = 90000
    # time duration, unit: ms
    KeepAlive = 30000
    MaxConnsPerHost = 0
    MaxIdleConns = 100
    MaxIdleConnsPerHost = 100
    

    Relevant logs

    局部nserverd日志
    2022-06-21 09:44:09.616229 WARNING writer/writer.go:42 post to http://prometheus:9090/api/v1/write got error: push data with remote write request got status code: 500, response body: label name "busigroup" is not unique: invalid sample
    2022-06-21 09:44:09.616338 WARNING writer/writer.go:43 example timeseries:labels:<name:"__name__" value:"kernel_processes_forked" > labels:<name:"ident" value:"11.68.150.59_\351\251\254\351\201\223\345\244\264" > labels:<name:"busigroup" value:"jykj_dt" > labels:<name:"busigroup" value:"jykj_dt" > samples:<value:4.0868227e+07 timestamp:1655775848000 >
    

    System info

    前端版本:5.5.1 后端版本:5.9.2

    Steps to reproduce

    1.修改中心端webapi.conf文件中cluster信息 2.修改局部n9e,server.con中DB信息 3.重启局部n9e server服务 ...

    Expected behavior

    中心端显示多套集群的节点信息及即时信息 注:中心端使用组件部署,局部使用docker部署

    Actual behavior

    中心端在对象列表显示多集群节点信息,在监控看图无法显示另一集群的信息。

    Additional info

    image

    opened by yinyanqiang 10
  • 数据有断层,不知如何排查

    数据有断层,不知如何排查

    What happened:一台主句的指标数据断断续续

    What you expected to happen:log没有问题

    How to reproduce it (as minimally and precisely as possible):

    Anything else we need to know?:

    Environment:

    • OS (e.g: cat /etc/os-release):centos 6.6
    • Logs:
    • Others:
    opened by fangyt-ect 10
  • feat: persist notify cur number

    feat: persist notify cur number

    feat: persist notify cur number

    持久化连续通知次数,方便后面能一眼看出来该告警已连续通知多少次。

    pg更新脚本: ALTER TABLE alert_cur_event ADD notify_cur_number int not null default 0; ALTER TABLE alert_his_event ADD notify_cur_number int not null default 0;

    opened by tanxiao1990 9
  • 监控大盘promql 引用 变量会报错,无法配置

    监控大盘promql 引用 变量会报错,无法配置

    夜莺版本: 下载 5.6.3 版本源码,使用docker-compose部署

    • 前端版本:5.2.1

    • 后端版本:5.6.3

    • chrome 版本 84.0.4147.105(正式版本) (64 位)

    问题和复现方法: 部署完成后,前端配置一个监控大盘

    监控大盘新增一个 host 变量,,然后随便新增一个监控图表,配置 promql如下 cpu_usage_user{ident="$host"} 点击保存,无法生效,控制台报如下错误

    vendor.afcc874e.js:27 TypeError: i.replaceAll is not a function
        at u1 (index.a3d7ea18.js:33)
        at index.a3d7ea18.js:33
        at Oe (vendor.afcc874e.js:49)
        at Function.xa (vendor.afcc874e.js:49)
        at y (index.a3d7ea18.js:33)
        at index.a3d7ea18.js:33
        at Ss (vendor.afcc874e.js:27)
        at t.unstable_runWithPriority (vendor.afcc874e.js:18)
        at Ri (vendor.afcc874e.js:27)
        at _s (vendor.afcc874e.js:27)
    

    作为对比,直接写死ident的值,图表加载正常: cpu_usage_user{ident="telegraf01"}

    opened by xiaohuione 9
  • 帮忙分析一下感谢,server日志中出现大量client 404

    帮忙分析一下感谢,server日志中出现大量client 404

    Relevant server.conf | webapi.conf

    Relevant logs

    2022-12-28 16:50:42.090274 ERROR engine/worker.go:209 rule_eval:245 promql:increase(net_drop_out[1m]) > 0, error:client_error: client error: 404
    2022-12-28 16:50:42.093313 ERROR engine/worker.go:209 rule_eval:263 promql:http_response_http_response_code > 500, error:client_error: client error: 404
    2022-12-28 16:50:42.114400 ERROR engine/worker.go:209 rule_eval:246 promql:netstat_tcp_time_wait > 20000, error:client_error: client error: 404
    2022-12-28 16:50:42.127133 ERROR engine/worker.go:209 rule_eval:244 promql:increase(net_drop_in[1m]) > 0, error:client_error: client error: 404
    2022-12-28 16:50:42.128994 ERROR engine/worker.go:209 rule_eval:238 promql:target_up != 1, error:client_error: client error: 404
    2022-12-28 16:50:42.139551 ERROR engine/worker.go:209 rule_eval:249 promql:procstat_lookup_result_code != 0, error:client_error: client error: 404
    2022-12-28 16:50:42.147291 ERROR engine/worker.go:209 rule_eval:242 promql:rate(diskio_io_time[1m])/10 > 99, error:client_error: client error: 404
    2022-12-28 16:50:42.152683 ERROR engine/worker.go:209 rule_eval:277 promql:disk_used_percent > 85, error:client_error: client error: 404
    2022-12-28 16:50:42.155019 ERROR engine/worker.go:209 rule_eval:239 promql:net_response_result_code != 0, error:client_error: client error: 404
    2022-12-28 16:50:42.156042 ERROR engine/worker.go:209 rule_eval:247 promql:procstat_lookup_running == 0, error:client_error: client error: 404
    2022-12-28 16:50:42.182513 ERROR engine/worker.go:209 rule_eval:241 promql:mem_available_percent < 10, error:client_error: client error: 404
    2022-12-28 16:50:42.186227 ERROR engine/worker.go:209 rule_eval:248 promql:procstat_rlimit_num_fds_soft < 2048, error:client_error: client error: 404
    2022-12-28 16:50:42.190165 ERROR engine/worker.go:209 rule_eval:240 promql:cpu_usage_idle{cpu="cpu-total"} < 25, error:client_error: client error: 404
    2022-12-28 16:50:42.265185 ERROR engine/worker.go:209 rule_eval:237 promql:ping_result_code != 0, error:client_error: client error: 404
    2022-12-28 16:50:42.298631 ERROR engine/worker.go:209 rule_eval:243 promql:predict_linear(disk_free[1h], 4*3600) < 0, error:client_error: client error: 404
    

    System info

    n9e-v5.14.3-linux-amd64 centos

    Steps to reproduce

    Expected behavior

    Actual behavior

    Additional info

    No response

    opened by worker24h 0
  • 5.14.4屏蔽中的报警会报出来

    5.14.4屏蔽中的报警会报出来

    Relevant server.conf | webapi.conf

    在5.13.1的基础上只在server.conf的
    [WriterOpt]下加了
    ShardingKey = "ident"  其他同5.13.1
    

    Relevant logs

    2022-12-23 18:38:58.012905 INFO engine/logger.go:19 event(568b8a2de13649785ce5ae2648171945 triggered) consume: rule_id=62 [__name__=port_plugin_collector_8080 env=online host=xxxxxx0004-vm instance=xxxxxx job=node_exporters_http node=xxxxxxx4-vm rulename=服务挂了(8080端口挂了) service=xxxxxx][email protected]
    

    System info

    n9e 5.14.4,n9e-fe 5.14.3,centos

    Steps to reproduce

    1.上线前屏蔽机器的报警,时长20分钟(18:34到18:54) image

    2.开始上线重启服务 3.发现屏蔽中的报警被报出来了,期间看夜莺的屏蔽规则里是有的(忘了截图了),而且报警里面持续时间2s也不太可能,我的规则是60s一次,连续3次才报警 image

    ...

    Expected behavior

    屏蔽中的规则,不用报警

    Actual behavior

    屏蔽中的规则却报出来了

    Additional info

    No response

    opened by shenghuofei 13
  • update: view metrics data by instance

    update: view metrics data by instance

    What type of PR is this? update mongo dashboard template

    What this PR does / why we need it:

    Each chart in the dashboard contains the metric data of all instances. After updating, you can select the instance Which issue(s) this PR fixes:

    Fixes # https://github.com/flashcatcloud/categraf/issues/255

    Special notes for your reviewer:

    opened by lunuan 0
  • 飞书告警消息模板支持 ”消息卡片“ 模式,能够使用 ”lark_md“ 消息格式

    飞书告警消息模板支持 ”消息卡片“ 模式,能够使用 ”lark_md“ 消息格式

    What would you like to be added: 飞书告警消息模板支持 ”消息卡片“ 模式,能够使用 ”lark_md“ 消息格式。

    Why is this needed: 飞书 V2 报警模板,能够根据告警的状态(类似于钉钉和企业微信告警模板里的功能)实现不同的 title color。偏于区分告警的状态,更加方便运维以及业务人员关注告警信息。

    opened by XiaoMuYi 0
  • v5.14.1和v5.14.2引用的toolkits\pkg 1.3.1存在bug

    v5.14.1和v5.14.2引用的toolkits\pkg 1.3.1存在bug

    Relevant server.conf | webapi.conf

    # debug, release
    RunMode = "release"
    
    # my cluster name
    ClusterName = "Default"
    
    # Default busigroup Key name
    # do not change
    BusiGroupLabelKey = "busigroup"
    
    # sleep x seconds, then start judge engine
    EngineDelay = 60
    
    DisableUsageReport = false
    
    # config | database
    ReaderFrom = "config"
    
    [Log]
    # log write dir
    Dir = "logs"
    # log level: DEBUG INFO WARNING ERROR
    Level = "INFO"
    # stdout, stderr, file
    Output = "stdout"
    # # rotate by time
    # KeepHours: 4
    # # rotate by size
    # RotateNum = 3
    # # unit: MB
    # RotateSize = 256
    
    [HTTP]
    # http listening address
    Host = "0.0.0.0"
    # http listening port
    Port = 19000
    # https cert file path
    CertFile = ""
    # https key file path
    KeyFile = ""
    # whether print access log
    PrintAccessLog = false
    # whether enable pprof
    PProf = false
    # http graceful shutdown timeout, unit: s
    ShutdownTimeout = 30
    # max content length: 64M
    MaxContentLength = 67108864
    # http server read timeout, unit: s
    ReadTimeout = 20
    # http server write timeout, unit: s
    WriteTimeout = 40
    # http server idle timeout, unit: s
    IdleTimeout = 120
    
    # [BasicAuth]
    # user002 = "ccc26da7b9aba533cbb263a36c07dcc9"
    
    [Heartbeat]
    # auto detect if blank
    IP = ""
    # unit ms
    Interval = 1000
    
    [SMTP]
    Host = "smtp.163.com"
    Port = 994
    User = "username"
    Pass = "password"
    From = "[email protected]"
    InsecureSkipVerify = true
    Batch = 5
    
    [Alerting]
    # timeout settings, unit: ms, default: 30000ms
    Timeout=30000
    TemplatesDir = "./etc/template"
    NotifyConcurrency = 10
    # use builtin go code notify
    NotifyBuiltinChannels = ["email", "dingtalk", "wecom", "feishu", "mm"]
    
    [Alerting.CallScript]
    # built in sending capability in go code
    # so, no need enable script sender
    Enable = false
    ScriptPath = "./etc/script/notify.py"
    
    [Alerting.CallPlugin]
    Enable = false
    # use a plugin via `go build -buildmode=plugin -o notify.so`
    PluginPath = "./etc/script/notify.so"
    # The first letter must be capitalized to be exported
    Caller = "N9eCaller"
    
    [Alerting.RedisPub]
    Enable = false
    # complete redis key: ${ChannelPrefix} + ${Cluster}
    ChannelPrefix = "/alerts/"
    
    [Alerting.Webhook]
    Enable = false
    Url = "http://a.com/n9e/callback"
    BasicAuthUser = ""
    BasicAuthPass = ""
    Timeout = "5s"
    Headers = ["Content-Type", "application/json", "X-From", "N9E"]
    
    [NoData]
    Metric = "target_up"
    # unit: second
    Interval = 120
    
    [Ibex]
    # callback: ${ibex}/${tplid}/${host}
    Address = "127.0.0.1:10090"
    # basic auth
    BasicAuthUser = "ibex"
    BasicAuthPass = "ibex"
    # unit: ms
    Timeout = 3000
    
    [Redis]
    # address, ip:port or ip1:port,ip2:port for cluster and sentinel(SentinelAddrs)
    Address = "127.0.0.1:6379"
    # Username = ""
    # Password = ""
    # DB = 0
    # UseTLS = false
    # TLSMinVersion = "1.2"
    # standalone cluster sentinel
    RedisType = "standalone"
    # Mastername for sentinel type
    # MasterName = "mymaster"
    
    [DB]
    # postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
    DSN="root:[email protected](127.0.0.1:3306)/n9e_v5?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"
    # enable debug mode or not
    Debug = false
    # mysql postgres
    DBType = "mysql"
    # unit: s
    MaxLifetime = 7200
    # max open connections
    MaxOpenConns = 150
    # max idle connections
    MaxIdleConns = 50
    # table prefix
    TablePrefix = ""
    # enable auto migrate or not
    # EnableAutoMigrate = false
    
    [Reader]
    # prometheus base url
    Url = "http://127.0.0.1:9090"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Timeout = 30000
    DialTimeout = 3000
    MaxIdleConnsPerHost = 100
    
    [WriterOpt]
    # queue channel count
    QueueCount = 1000
    # queue max size
    QueueMaxSize = 1000000
    # once pop samples number from queue
    QueuePopSize = 1000
    # metric or ident
    ShardingKey = "ident"
    
    [[Writers]]
    Url = "http://127.0.0.1:9090/api/v1/write"
    # Basic auth username
    BasicAuthUser = ""
    # Basic auth password
    BasicAuthPass = ""
    # timeout settings, unit: ms
    Headers = ["X-From", "n9e"]
    Timeout = 10000
    DialTimeout = 3000
    TLSHandshakeTimeout = 30000
    ExpectContinueTimeout = 1000
    IdleConnTimeout = 90000
    # time duration, unit: ms
    KeepAlive = 30000
    MaxConnsPerHost = 0
    MaxIdleConns = 100
    MaxIdleConnsPerHost = 100
    # [[Writers.WriteRelabels]]
    # Action = "replace"
    # SourceLabels = ["__address__"]
    # Regex = "([^:]+)(?::\\d+)?"
    # Replacement = "$1:80"
    # TargetLabel = "__address__"
    
    # [[Writers]]
    # Url = "http://127.0.0.1:7201/api/v1/prom/remote/write"
    # # Basic auth username
    # BasicAuthUser = ""
    # # Basic auth password
    # BasicAuthPass = ""
    # # timeout settings, unit: ms
    # Timeout = 30000
    # DialTimeout = 10000
    # TLSHandshakeTimeout = 30000
    # ExpectContinueTimeout = 1000
    # IdleConnTimeout = 90000
    # # time duration, unit: ms
    # KeepAlive = 30000
    # MaxConnsPerHost = 0
    # MaxIdleConns = 100
    # MaxIdleConnsPerHost = 100
    

    Relevant logs

    # github.com/toolkits/pkg/logger
    \go\pkg\mod\github.com\toolkits\[email protected]\logger\config.go:37:32: cannot use sb (variable of type *syslogBackend) as type Backend in argument to log.SetLogging:
    	*syslogBackend does not implement Backend (missing Close method)
    		have close()
    		want Close()
    

    System info

    n9e v5.14.1 v5.14.2

    Steps to reproduce

    1. 设置启动参数 server conf=server.json后 启动

    Expected behavior

    希望启动成功

    Actual behavior

    无法启动 提示:

    github.com/toolkits/pkg/logger

    \go\pkg\mod\github.com\toolkits\[email protected]\logger\config.go:37:32: cannot use sb (variable of type *syslogBackend) as type Backend in argument to log.SetLogging: *syslogBackend does not implement Backend (missing Close method) have close() want Close()

    Additional info

    这个属于 toolkits/pkg的bug https://github.com/toolkits/pkg/issues/10

    opened by jialine 0
Releases(v5.14.4)
  • v5.14.4(Dec 18, 2022)

    What's Changed

    • feat: support fetch user info based on query type @lsy1990 in https://github.com/ccfos/nightingale/pull/1326
    • feat: target tags can rewrite labels deined in categraf config file @allenz92 in https://github.com/ccfos/nightingale/pull/1321
    • fix: alert mute

    New Contributors

    • @allenz92 made their first contribution in https://github.com/ccfos/nightingale/pull/1321

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.14.2...v5.14.3

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.14.4-linux-amd64.tar.gz(15.02 MB)
    n9e-v5.14.4-linux-arm64.tar.gz(14.32 MB)
  • v5.14.3(Dec 13, 2022)

    What's Changed

    • feat: 添加Telegram Bot通知支持 by @Mystery00 in https://github.com/ccfos/nightingale/pull/1295
    • fix: webapi conf sso section typo by @Doublemine in https://github.com/ccfos/nightingale/pull/1298
    • update dashboard template for mongodb by @lunuan in https://github.com/ccfos/nightingale/pull/1293
    • replace lable host to ident by @hagic-hhj in https://github.com/ccfos/nightingale/pull/1302
    • support fetch user group by user name by @lsy1990 in https://github.com/ccfos/nightingale/pull/1311
    • fix: support redis sentinel password by @zhousbo in https://github.com/ccfos/nightingale/pull/1315
    • n9e server support multi cluster alert by @710leo in https://github.com/ccfos/nightingale/pull/1318
    • fix: 多集群告警的bug
    • feat:支持使用一个 n9e-server 进程对接多个时序库

    New Contributors

    • @Mystery00 made their first contribution in https://github.com/ccfos/nightingale/pull/1295
    • @Doublemine made their first contribution in https://github.com/ccfos/nightingale/pull/1298
    • @lunuan made their first contribution in https://github.com/ccfos/nightingale/pull/1293
    • @zhousbo made their first contribution in https://github.com/ccfos/nightingale/pull/1315

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.14.2...v5.14.3

    alter table alerting_engines drop index instance;
    
    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.14.3-linux-amd64.tar.gz(15.02 MB)
    n9e-v5.14.3-linux-arm64.tar.gz(14.32 MB)
  • v5.14.2(Nov 23, 2022)

    What's Changed

    • feat: 报警脚本超时时间改为可配置 by @JellyTony in https://github.com/ccfos/nightingale/pull/1253
    • replace json with easyjson for router by @kongfei605 in https://github.com/ccfos/nightingale/pull/1261
    • feat: add timeseries sample log filter by @710leo in https://github.com/ccfos/nightingale/pull/1281
    • refactor: use SafeList instead of channel as queue
    • add cas and oauth2 login entry

    配置修改

    • 删除 server.conf 中的 ForceUseServerTS
    • 调整 server.conf 中的 WriterOpt 部分如下:
    [WriterOpt]
    # queue channel count
    QueueCount = 1000
    # queue max size
    QueueMaxSize = 1000000
    # once pop samples number from queue
    QueuePopSize = 1000
    # metric or ident
    ShardingKey = "ident"
    

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.14.1...v5.14.2

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.14.2-linux-amd64.tar.gz(14.98 MB)
    n9e-v5.14.2-linux-arm64.tar.gz(14.28 MB)
  • v5.14.1(Nov 3, 2022)

    What's Changed

    • alert mute cannot refresh the bug by @chenginger in https://github.com/ccfos/nightingale/pull/1242
    • feat:CAS and OAuth2 login by @foursevenlove in https://github.com/ccfos/nightingale/pull/1236

    New Contributors

    • @chenginger made their first contribution in https://github.com/ccfos/nightingale/pull/1242
    • @foursevenlove made their first contribution in https://github.com/ccfos/nightingale/pull/1236

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.14.0...v5.14.1

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.14.1-linux-amd64.tar.gz(14.88 MB)
    n9e-v5.14.1-linux-arm64.tar.gz(14.19 MB)
  • v5.14.0(Oct 25, 2022)

    What's Changed

    • feat: conf file password supports ciphertext by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1207
    • docs: pg init sql by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1210
    • feat: alert rule supports variables by @bbaobelief in https://github.com/ccfos/nightingale/pull/1217
    • feat: dashboard supports identifier field

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.13.1...v5.14.0

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.14.0-linux-amd64.tar.gz(14.85 MB)
    n9e-v5.14.0-linux-arm64.tar.gz(14.16 MB)
  • v5.13.1(Oct 19, 2022)

    What's Changed

    • PromClient支持Header配置 by @710leo in https://github.com/ccfos/nightingale/pull/1203
    • 准备让监控大盘支持英文标识,后端已完成,前端尚未,需要修改表结构,SQL在下面
    • 前端变更的较多,主要是针对监控大盘的优化,如果没有遇到这些问题也不用升级

    前端变更

    https://github.com/n9e/fe-v5/releases/tag/v5.13.2

    • feat: 大盘支持单独配置默认集群
    • feat: 大盘表格卡片支持过滤
    • refactor: 优化折线图 tooltip 定位
    • refactor: 优化大盘变量多选和全选的交互
    • refactor: 优化仪表盘图布局计算

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.13.0...v5.13.1

    alter table board add column ident varchar(200) not null default '' after name;
    alter table board add index index_ident(ident);
    
    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.13.1-linux-amd64.tar.gz(14.85 MB)
    n9e-v5.13.1-linux-arm64.tar.gz(14.16 MB)
  • v5.13.0(Oct 12, 2022)

  • v5.12.0(Sep 26, 2022)

    What's Changed

    • docs: fix pg init sql by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1154
    • feat: add configs service api by @710leo in https://github.com/ccfos/nightingale/pull/1155
    • feat: support for sharing dashboards by @kurolz in https://github.com/ccfos/nightingale/pull/1150
    • bug: user update by multifields, param need '...' by @gengleiming in https://github.com/ccfos/nightingale/pull/1170

    New Contributors

    • @kurolz made their first contribution in https://github.com/ccfos/nightingale/pull/1150
    • @gengleiming made their first contribution in https://github.com/ccfos/nightingale/pull/1170

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.11.3...v5.12.0

    alter table board add column `public` tinyint(1) not null default 0 comment '0:false 1:true' after tags;
    
    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.12.0-linux-amd64.tar.gz(14.71 MB)
    n9e-v5.12.0-linux-arm64.tar.gz(14.02 MB)
  • v5.11.3(Sep 7, 2022)

    前端变更

    • feat: 添加告警引擎管理页面
    • feat: 告警屏蔽规则支持编辑修改和是否启用 @SunnyBoy-WYH
    • refactor: 告警规则生效时间范围调整为两个时间选择器,解决之前开始时间不能大于结束时间的问题
    • refactor: 监控大盘变量管理页面重构
      • feat: 新增 custom、constant 类型
      • refactor: 将列表和编辑页面拆分
    • refactor: 监控大盘升级
      • feat: 蜂窝图可设置选择显示内容(名称和值、名称、值)
      • feat: 折线图 Y 轴支持 logarithmic scale
      • feat: stat 图可通过阈值设置来映射颜色(valueMappings 的设置也生效并且权重高于阈值设置)
      • refactor: 折线图起止时间调整为查询的时间范围(之前是返回数据的时间范围)
      • refactor: 监控大盘图表 PromQL 输入框优化,解决未完成输入就触发请求
      • fix: 修复表格图表某些显示模式下无法排序问题
      • fix: 修复 stat 图在编辑状态下切换颜色模式对应的视图显示有误问题
    • fix: 修复即时查询 Graph 模式相同的 ql 无法再次触发查询最新时间数据问题

    后端变更

    • fix: alert mute clean by @710leo in https://github.com/ccfos/nightingale/pull/1140
    • feat: compatible with redis4 to 7 by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1141
    • docs: pg init sql by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1142
    • feat: alert-mute support edit and disable by @SunnyBoy-WYH in https://github.com/ccfos/nightingale/pull/1144
    • feat: alert_subscribe add name and disabled by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1145

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.11.2...v5.11.3

    需要变更的SQL

    alter table alert_mute add column `note` varchar(1024) not null default '' after `prod`;
    alter table alert_mute add column `disabled` tinyint(1) not null default 0 comment '0:enabled 1:disabled' after `etime`;
    alter table alert_mute add column `update_at` bigint not null default 0 after `create_by`;
    alter table alert_mute add column `update_by` varchar(64) not null default '' after `update_at`;
    alter table alert_subscribe add column `name` varchar(255) not null default '' after `id`;
    alter table alert_subscribe add column `disabled` tinyint(1) not null default 0 comment '0:enabled 1:disabled' after `name`;
    
    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.11.3-linux-amd64.tar.gz(14.70 MB)
    n9e-v5.11.3-linux-arm64.tar.gz(14.01 MB)
  • v5.11.2(Aug 31, 2022)

  • v5.11.1(Aug 27, 2022)

    What's Changed

    • fix: add board check when del group by @xiaoziv in https://github.com/ccfos/nightingale/pull/1124
    • add configuration ForceUseServerTS by @UlricQin in https://github.com/ccfos/nightingale/pull/1128
    • Commuinity guide by @laiwei in https://github.com/ccfos/nightingale/pull/1133
    • add id column for table user_group_member and role_operation by @xiaoziv in https://github.com/ccfos/nightingale/pull/1126
    • bugfix: server heartbeat via database

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.11.0...v5.11.1

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.11.1-linux-amd64.tar.gz(14.64 MB)
    n9e-v5.11.1-linux-arm64.tar.gz(13.95 MB)
  • v5.11.0(Aug 22, 2022)

    写在前面

    这个版本是把一些近期积累的PR统一发版,另外就是把 n9e-server 的心跳换成 DB,同时为未来做准备,未来准备支持在页面上配置 datasource,然后n9e-server和datasource的关联关系在页面上完成

    后端更新

    • support tpls reload by @xiaoziv in https://github.com/ccfos/nightingale/pull/1104
    • use slim base image by @xiaoziv in https://github.com/ccfos/nightingale/pull/1105
    • Update docker-compose.yaml by @JellyTony in https://github.com/ccfos/nightingale/pull/1107
    • optimize error report by @xiaoziv in https://github.com/ccfos/nightingale/pull/1109
    • add ping监控指标中文说明 by @nondevops in https://github.com/ccfos/nightingale/pull/1110
    • 增加 aws cloudwatch rds metrics 中文信息 by @mofrom in https://github.com/ccfos/nightingale/pull/1111
    • 添加告警规则执行日志 by @bbaobelief in https://github.com/ccfos/nightingale/pull/1112
    • improve community governance by @laiwei in https://github.com/ccfos/nightingale/pull/1115
    • feat: support handle event service api by @710leo in https://github.com/ccfos/nightingale/pull/1113
    • read prom url from database by @UlricQin in https://github.com/ccfos/nightingale/pull/1119
    • docs: sync pg init sql by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1122
    • feat: alert rule support cate by @710leo in https://github.com/ccfos/nightingale/pull/1123
    • Mgmt server cluster name in web by @UlricQin in https://github.com/ccfos/nightingale/pull/1127

    前端更新

    • feat: stat 图支持选择值字段(是的,可以展示 label value 了)
    • feat: 图表标题、描述、链接设置以及分组标题等支持关联大盘变量值(支持 $var${var} 的写法)
    • refactor: 折线图标题和内容间距优化
    • refactor: 优化图表 query 输入框,解决未输入完成就触发查询导致接口报错问题
    • refactor: 优化大盘变量下拉选择框宽度,尽可能完整展示内容
    • fix: 修复切换大盘集群后没有更新大盘数据
    • fix: 修复表格渲染报错问题 #175

    New Contributors

    • @JellyTony made their first contribution in https://github.com/ccfos/nightingale/pull/1107
    • @nondevops made their first contribution in https://github.com/ccfos/nightingale/pull/1110
    • @mofrom made their first contribution in https://github.com/ccfos/nightingale/pull/1111

    SQL

    alter table alert_rule add column cate varchar(128) not null default '' after group_id;
    alter table alert_mute add column cate varchar(128) not null default '' after group_id;
    alter table alert_subscribe add column cate varchar(128) not null default '' after group_id;
    alter table alert_cur_event add column cate varchar(128) not null default '' after group_id;
    alter table alert_his_event add column cate varchar(128) not null default '' after group_id;
    
    CREATE TABLE `alerting_engines`
    (
        `id` int unsigned NOT NULL AUTO_INCREMENT,
        `instance` varchar(128) not null default '' comment 'instance identification, e.g. 10.9.0.9:9090',
        `cluster` varchar(128) not null default '' comment 'target reader cluster',
        `clock` bigint not null,
        PRIMARY KEY (`id`),
        UNIQUE KEY (`instance`)
    ) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
    

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.10.3...v5.11.0

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.11.0-linux-amd64.tar.gz(14.64 MB)
    n9e-v5.11.0-linux-arm64.tar.gz(13.95 MB)
  • v5.10.3(Aug 11, 2022)

    What's Changed

    • feat: prom support tls by @710leo in https://github.com/ccfos/nightingale/pull/1091
    • feat: support i18n request headerkey by @xiaoziv in https://github.com/ccfos/nightingale/pull/1094
    • fix i18n header bug by @xiaoziv in https://github.com/ccfos/nightingale/pull/1095
    • feat: support i18n metric desc by @xiaoziv in https://github.com/ccfos/nightingale/pull/1097
    • feat: add write_relabel action before n9e remote writing to multi tsdb by @resurgence72 in https://github.com/ccfos/nightingale/pull/1098
    • feat: support ident disk usage metric by @xiaoziv in https://github.com/ccfos/nightingale/pull/1100

    n9e-fe Changed

    • feat: 对象列表表格新增 状态、负载、内存 列数据
    • feat: 导入 grafana 大盘升级 #162
      • 支持导入 bargauge、text 图
      • 支持导入 textbox、custom 类型变量
    • feat: 大盘变量 query 类型的变量定义约束放宽,label + value 可定义成变量,增加灵活度
    • refactor: 即时查询输入框去掉了前缀文字 “PromQL”,修复 Table 模式下不可多次查询相同 promql 的问题
    • fix: 修复监控大盘表格单元格文字颜色某些场景无法匹配 valueMapping 的设置

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.10.2...v5.10.3

    add configurations in webapi.conf

    [TargetMetrics]
    TargetUp = '''max(max_over_time(target_up{ident=~"(%s)"}[%dm])) by (ident)'''
    LoadPerCore = '''max(max_over_time(system_load_norm_1{ident=~"(%s)"}[%dm])) by (ident)'''
    MemUtil = '''100-max(max_over_time(mem_available_percent{ident=~"(%s)"}[%dm])) by (ident)'''
    
    image

    New Contributors

    Thank you for contributions

    • @resurgence72 made their first contribution in https://github.com/ccfos/nightingale/pull/1098

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.10.2...v5.10.3

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.10.3-linux-amd64.tar.gz(14.61 MB)
    n9e-v5.10.3-linux-arm64.tar.gz(13.92 MB)
  • v5.10.2(Aug 6, 2022)

    What's Changed

    • feat: add first trigger time by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1086
    • feat: support rule convert from prometheus/vmalert by @xiaoziv in https://github.com/ccfos/nightingale/pull/1087
    • feat: support alert graph url by @xiaoziv in https://github.com/ccfos/nightingale/pull/1088
    • remove record rule check by @xiaoziv in https://github.com/ccfos/nightingale/pull/1090

    n9e-fe Changed

    • feat: 大盘 query 变量支持自定义全选值 #163
    • feat: 即时查询新增 autocomplete 开关 #145
    • refactor: 即时查询载入后默认不加载 metrics name 数据,需要点击后 #145
    • refactor: 优化暗黑模式折线图阈值在某些设备里显示不清晰的问题 #158
    • fix: 修复监控大盘针对多人同时编辑会被覆盖的问题
    • fix: 修复即时查询分享图表报错问题 #158
    • fix: 修复图表编辑 target 的自定义时间显示不正确问题 #157
    • fix: 修复只读权限访问监控大盘操作分组展开会有接口报错问题 #150
    • fix: 通过活跃报警跳转无法屏蔽报警 #159

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.10.1...v5.10.2

    alter TABLE alert_cur_event add COLUMN `first_trigger_time` bigint AFTER target_note; 
    alter TABLE alert_his_event add COLUMN `first_trigger_time` bigint AFTER target_note;
    
    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.10.2-linux-amd64.tar.gz(14.59 MB)
    n9e-v5.10.2-linux-arm64.tar.gz(13.90 MB)
  • v5.10.1(Aug 2, 2022)

  • v5.10.0(Aug 1, 2022)

    What's Changed

    • refactor: add error log by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1076
    • refactor: error info return by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1077
    • docker compose use latest version of n9e and categraf by @kongfei605 in https://github.com/ccfos/nightingale/pull/1079
    • include new UI

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.9.8...v5.10.0

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(194 bytes)
    n9e-v5.10.0-linux-amd64.tar.gz(14.58 MB)
    n9e-v5.10.0-linux-arm64.tar.gz(13.89 MB)
  • v5.9.7(Jul 27, 2022)

    What's Changed

    • [feature] add proxy auth support by @xiaoziv in https://github.com/ccfos/nightingale/pull/1035
    • fix: fix version info by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1036
    • fix: fix plugin error by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1038
    • Feature mute enhancement by @xiaoziv in https://github.com/ccfos/nightingale/pull/1041
    • get alert-rule node by @bbaobelief in https://github.com/ccfos/nightingale/pull/1042
    • fix mute: parse regexp by @UlricQin in https://github.com/ccfos/nightingale/pull/1044
    • [feat(#984)] multiple cluster support by @xiaoziv in https://github.com/ccfos/nightingale/pull/1045
    • keep build version in Makefile consistency with goreleaser by @kongfei605 in https://github.com/ccfos/nightingale/pull/1047
    • [feature] support multiple cluster config with mute&subscribe by @xiaoziv in https://github.com/ccfos/nightingale/pull/1046
    • Query batch feature by @SunnyBoy-WYH in https://github.com/ccfos/nightingale/pull/1052
    • update community governance by @laiwei in https://github.com/ccfos/nightingale/pull/1056
    • fix: event push api by @710leo in https://github.com/ccfos/nightingale/pull/1057
    • fix get alert rules by api by @710leo in https://github.com/ccfos/nightingale/pull/1059
    • [fix] fix the docker problem of apple chip by @Hwloser in https://github.com/ccfos/nightingale/pull/1060
    • supply plugin to notify maintainer by @lsy1990 in https://github.com/ccfos/nightingale/pull/1063
    • code refactor notify plugin by @UlricQin in https://github.com/ccfos/nightingale/pull/1065
    • code refactor notify by @UlricQin in https://github.com/ccfos/nightingale/pull/1066
    • modify prometheus query batch response format by @UlricQin in https://github.com/ccfos/nightingale/pull/1068
    • feat: push event api support mute by @710leo in https://github.com/ccfos/nightingale/pull/1070
    • fix proxy auth username error by @xiaoziv in https://github.com/ccfos/nightingale/pull/1072
    • add api: /board/:bid/pure by @UlricQin in https://github.com/ccfos/nightingale/pull/1073

    New Contributors

    • @xiaoziv made their first contribution in https://github.com/ccfos/nightingale/pull/1035
    • @SunnyBoy-WYH made their first contribution in https://github.com/ccfos/nightingale/pull/1052
    • @Hwloser made their first contribution in https://github.com/ccfos/nightingale/pull/1060
    • @lsy1990 made their first contribution in https://github.com/ccfos/nightingale/pull/1063

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.9.6...v5.9.7

    Source code(tar.gz)
    Source code(zip)
    checksums.txt(192 bytes)
    n9e-v5.9.7-linux-amd64.tar.gz(14.07 MB)
    n9e-v5.9.7-linux-arm64.tar.gz(13.38 MB)
  • v5.9.6(Jul 8, 2022)

  • v5.9.5(Jul 7, 2022)

    What's Changed

    • Add recording rule by @tripitakav in https://github.com/ccfos/nightingale/pull/1015
    • add community guide and governance docs (draft) by @laiwei in https://github.com/ccfos/nightingale/pull/1019
    • refactor recording rule and add field disabled by @UlricQin in https://github.com/ccfos/nightingale/pull/1022
    • fix: fix event api for service by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1026
    • report sample queue size by @UlricQin in https://github.com/ccfos/nightingale/pull/1027
    • add rulename as mute field @tripitakav in https://github.com/ccfos/nightingale/pull/1025
    • fix get clusters by api by @710leo in https://github.com/ccfos/nightingale/pull/1030
    • auto release with github action by @kongfei605 in https://github.com/ccfos/nightingale/pull/1032

    New Contributors

    • @tripitakav made their first contribution in https://github.com/ccfos/nightingale/pull/1015
    • @kongfei605 made their first contribution in https://github.com/ccfos/nightingale/pull/1032

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.9.4...v5.9.5

    SQL

    insert into `role_operation`(role_name, operation) values('Standard', '/recording-rules');
    insert into `role_operation`(role_name, operation) values('Standard', '/recording-rules/add');
    insert into `role_operation`(role_name, operation) values('Standard', '/recording-rules/put');
    insert into `role_operation`(role_name, operation) values('Standard', '/recording-rules/del');
    
    CREATE TABLE `recording_rule` (
        `id` bigint unsigned not null auto_increment,
        `group_id` bigint not null default '0' comment 'group_id',
        `cluster` varchar(128) not null,
        `name` varchar(255) not null comment 'new metric name',
        `note` varchar(255) not null comment 'rule note',
        `disabled` tinyint(1) not null comment '0:enabled 1:disabled',
        `prom_ql` varchar(8192) not null comment 'promql',
        `prom_eval_interval` int not null comment 'evaluate interval',
        `append_tags` varchar(255) default '' comment 'split by space: service=n9e mod=api',
        `create_at` bigint default '0',
        `create_by` varchar(64) default '',
        `update_at` bigint default '0',
        `update_by` varchar(64) default '',
        PRIMARY KEY (`id`),
        KEY `group_id` (`group_id`),
        KEY `update_at` (`update_at`)
    ) ENGINE=InnoDB DEFAULT CHARSET = utf8mb4;
    
    Source code(tar.gz)
    Source code(zip)
    checksums.txt(192 bytes)
    n9e-v5.9.5-linux-amd64.tar.gz(14.04 MB)
    n9e-v5.9.5-linux-arm64.tar.gz(13.35 MB)
  • v5.9.4(Jul 5, 2022)

    What's Changed

    • refactor: use categraf as collector in docker compose by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/993
    • fix ForDuration by @bbaobelief in https://github.com/ccfos/nightingale/pull/999 ❗️important
    • fix typo by @JacoobH in https://github.com/ccfos/nightingale/pull/1004
    • update kafka alerts and dashboard by @ysyneu in https://github.com/ccfos/nightingale/pull/1012
    • feat: persist notify cur number by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/1013 ❗️important

    New Contributors

    • @JacoobH made their first contribution in https://github.com/ccfos/nightingale/pull/1004

    SQL

    alter table alert_cur_event add column `notify_cur_number` int not null default 0 comment '' after notify_groups;
    alter table alert_his_event add column `notify_cur_number` int not null default 0 comment '' after notify_groups;
    

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.9.3...v5.9.4

    Source code(tar.gz)
    Source code(zip)
    n9e-5.9.4.tar.gz(16.49 MB)
  • v5.9.3(Jun 27, 2022)

    What's Changed

    • Fix:fix target_up nodata judge for prometheus scrape by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/986
    • fix alert put api not verify bug by @chenxuan520 in https://github.com/ccfos/nightingale/pull/987
    • Feat:update docker-compose from telegraf to categraf by @tanxiao1990 in https://github.com/ccfos/nightingale/pull/992

    New Contributors

    • @chenxuan520 made their first contribution in https://github.com/ccfos/nightingale/pull/987

    Full Changelog: https://github.com/ccfos/nightingale/compare/v5.9.2...v5.9.3

    how to upgrade:

    1. backup your custom configuration files
    2. wget tarball, untar, replace files
    3. modify configuration files for your env
    4. restart n9e-webapi and n9e-server

    You can download the binary on gitlink: https://www.gitlink.org.cn/ccfos/nightingale/releases

    Source code(tar.gz)
    Source code(zip)
  • v5.9.2(Jun 15, 2022)

    • Feat: Notify maintainers when n9e-server occurs error
    • Feat: Support redis cluster and sentinel mode
    • Feat: Add init sql of postgres
    • Feat: Add some common template functions
    • Feat: alert_aggr_view support modify operation by admin role
    • Feat: Forward samples to backends in sequence
    • Feat: Add notify_max_number for alert rule
    • Feat: Add some dashboards json and alerts json of categraf
    alter table alert_mute add column `prod` varchar(255) not null default '' after group_id;
    alter table users add column `maintainer` tinyint(1) not null default 0 after contacts;
    alter table alert_rule add column `notify_max_number` int not null default 0 comment '' after notify_repeat_step;
    

    how to upgrade:

    1. backup your custom configuration files
    2. wget tarball, untar, replace files
    3. modify configuration files for your env
    4. execute the sql commands
    5. restart n9e-webapi and n9e-server
    Source code(tar.gz)
    Source code(zip)
    n9e-5.9.2.tar.gz(15.84 MB)
  • v5.8.0(May 21, 2022)

    • Feat: Update target's cluster field when clustername modified in server.conf
    • Feat: Add wait tool for docker-compose to improve startup success rate
    • Feat: Use alert_rule_note as template and support prometheus style variables
    • Feat: Support new dashboard. it requires manual dashboard migration
    • Feat: Add some table columns to support the algorithm alarm function that may be developed in the future

    how to upgrade:

    1. backup your custom configurations
    2. wget tarball, untar, replace files
    3. modify config files for your env
    4. execute the sql commands
    5. restart n9e-webapi and n9e-server
    6. migrate dashboard on page /help/migrate, very important !!!
    CREATE TABLE `board` (
        `id` bigint unsigned not null auto_increment,
        `group_id` bigint not null default 0 comment 'busi group id',
        `name` varchar(191) not null,
        `tags` varchar(255) not null comment 'split by space',
        `create_at` bigint not null default 0,
        `create_by` varchar(64) not null default '',
        `update_at` bigint not null default 0,
        `update_by` varchar(64) not null default '',
        PRIMARY KEY (`id`),
        UNIQUE KEY (`group_id`, `name`)
    ) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
    
    CREATE TABLE `board_payload` (
        `id` bigint unsigned not null comment 'dashboard id',
        `payload` mediumtext not null,
        UNIQUE KEY (`id`)
    ) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
    
    alter table alert_rule add column `prod` varchar(255) not null default '' after note;
    alter table alert_rule add column `algorithm` varchar(255) not null default '' after prod;
    alter table alert_rule add column `algo_params` varchar(255) after algorithm;
    alter table alert_rule add column `delay` int not null default 0 after algo_params;
    alter table alert_cur_event add column `rule_prod` varchar(255) not null default '' after rule_note;
    alter table alert_cur_event add column `rule_algo` varchar(255) not null default '' after rule_prod;
    alter table alert_his_event add column `rule_prod` varchar(255) not null default '' after rule_note;
    alter table alert_his_event add column `rule_algo` varchar(255) not null default '' after rule_prod;
    alter table alert_cur_event modify column rule_note varchar(2048) not null default 'alert rule note';
    alter table alert_his_event modify column rule_note varchar(2048) not null default 'alert rule note';
    alter table alert_rule modify column note varchar(1024) not null default '';
    
    Source code(tar.gz)
    Source code(zip)
    n9e-5.8.0.tar.gz(16.16 MB)
  • v5.7.0(Apr 28, 2022)

    • Feat: support noat parameter for dingtalk mediatype
    • Feat: support configurations to control writer's queue count
    • Feat: support redis tls client
    • Feat: admin user can modify builtin metric_view

    how to upgrade:

    1. backup your custom configurations
    2. wget tarball, untar, replace files
    3. modify config files for your env
    4. restart n9e-webapi and n9e-server
    Source code(tar.gz)
    Source code(zip)
    n9e-5.7.0.tar.gz(11.93 MB)
  • v5.6.3(Apr 18, 2022)

    • Feat: Modify NotifyBuiltinEnable to NotifyBuiltinChannels in server.conf
    • Feat: Use a separate channel to handle metric target_up

    how to upgrade:

    1. backup your custom configurations
    2. wget tarball, untar, replace files
    3. modify config files for your env
    4. restart n9e-webapi and n9e-server
    Source code(tar.gz)
    Source code(zip)
    n9e-5.6.3.tar.gz(11.92 MB)
  • v5.6.2(Apr 14, 2022)

  • v5.6.1(Apr 14, 2022)

  • v5.6.0(Apr 8, 2022)

    • New: alter table user rename as users for pg
    • New: Add builtin jmx_exporter dashboard
    • New: Add builtin linux_by_telegraf dashboard
    • New: Add builtin elasticsearch_by_telegraf dashboard
    • New: Add builtin mongo_by_telegraf dashboard
    • New: Add builtin process_by_telegraf dashboard
    • New: Add builtin linux_by_telegraf alerts
    • New: Use hostname+pid instead of IP as heartbeat identity
    • New: Add buitin metric_view and alert_aggr_view
    • Fix: Logic bug of rule.NotifyRecovered
    • Fix: List builtin dashboards and alerts
    • Fix: Fix order of metric_view and alert_aggr_view
    alter table user rename as users;
    
    delete from metric_view;
    
    insert into metric_view(name, cate, configs) values('Host View', 0, '{"filters":[{"oper":"=","label":"__name__","value":"cpu_usage_idle"}],"dynamicLabels":[],"dimensionLabels":[{"label":"ident","value":""}]}');
    
    delete from alert_aggr_view;
    
    insert into alert_aggr_view(name, rule, cate) values('By BusiGroup, Severity', 'field:group_name::field:severity', 0);
    insert into alert_aggr_view(name, rule, cate) values('By RuleName', 'field:rule_name', 0);
    

    how to upgrade:

    1. backup your custom configurations
    2. wget tarball, untar, replace files
    3. alter table
    4. modify config files for your env
    5. restart n9e-webapi and n9e-server
    Source code(tar.gz)
    Source code(zip)
    n9e-5.6.0.tar.gz(12.21 MB)
Owner
DiDi
滴滴出行
DiDi
High performance, distributed and low latency publish-subscribe platform.

Emitter: Distributed Publish-Subscribe Platform Emitter is a distributed, scalable and fault-tolerant publish-subscribe platform built with MQTT proto

emitter 3.4k Jan 2, 2023
short-url distributed and high-performance

durl 是一个分布式的高性能短链服务,逻辑简单,并提供了相关api接口,开发人员可以快速接入,也可以作为go初学者练手项目.

宋昂 549 Jan 2, 2023
Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Hamza Yusuff 6 Jun 1, 2022
High-Performance server for NATS, the cloud native messaging system.

NATS is a simple, secure and performant communications system for digital systems, services and devices. NATS is part of the Cloud Native Computing Fo

NATS - The Cloud Native Messaging System 12k Jan 8, 2023
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

etcd-io 42.2k Dec 30, 2022
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / 中文版 News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 4.5k Dec 30, 2022
Collection of high performance, thread-safe, lock-free go data structures

Garr - Go libs in a Jar Collection of high performance, thread-safe, lock-free go data structures. adder - Data structure to perform highly-performant

LINE 358 Dec 26, 2022
Go Open Source, Distributed, Simple and efficient Search Engine

Go Open Source, Distributed, Simple and efficient full text search engine.

ego 6.1k Dec 31, 2022
Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Sergey Shepelev 25 Dec 24, 2019
Golang client library for adding support for interacting and monitoring Celery workers, tasks and events.

Celeriac Golang client library for adding support for interacting and monitoring Celery workers and tasks. It provides functionality to place tasks on

Stefan von Cavallar 73 Oct 28, 2022
CockroachDB - the open source, cloud-native distributed SQL database.

CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications. What is CockroachDB?

CockroachDB 26.3k Dec 29, 2022
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Chris Lu 3.1k Jan 1, 2023
A distributed and coördination-free log management system

OK Log is archived I hoped to find the opportunity to continue developing OK Log after the spike of its creation. Unfortunately, despite effort, no su

OK Log 3k Dec 26, 2022
JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS is a high-performance POSIX file system released under GNU Affero General Public License v3.0. It is specially optimized for the cloud-native

Juicedata, Inc 7.2k Jan 4, 2023
Distributed-system - Practicing and learning the foundations of DS with Go

Distributed-System For practicing and learning the foundations of distributed sy

Ian Armstrong 1 May 4, 2022
BlobStore is a highly reliable,highly available and ultra-large scale distributed storage system

BlobStore Overview Documents Build BlobStore Deploy BlobStore Manage BlobStore License Overview BlobStore is a highly reliable,highly available and ul

CubeFS 14 Oct 10, 2022
A distributed system for embedding-based retrieval

Overview Vearch is a scalable distributed system for efficient similarity search of deep learning vectors. Architecture Data Model space, documents, v

vector search infrastructure for AI applications 1.5k Dec 30, 2022