Kubernetes monitor

Overview

模式说明

  • 对应配置项为collect_mode cadvisor_plugin | kubelet_agent | server_side 三选一
  • 代码为同一套代码
模式名称 部署运行方式 collect_mode配置 说明
夜莺插件形式采集cadvisor raw api 可执行的插件由夜莺agent调用 cadvisor_plugin 文档在readme最下面 (原有cadvisor采集模式)
容器基础资源指标采集 k8s daemonset 部署在每一个node上 kubelet_agent 统称为新模式 (kubelet地址由对应metrics port listen地址决定)
集中采集k8s服务组件 k8s deployment 部署 server_side 统称为新模式

k8s-mon对比prometheus的优点

  • Histogram指标分位值预计算,节约存储、降低服务端压力(高基数指标往往来自histogram的海量bucket,同时除了算分位值以外的应用如分布情况又较少)
  • 基础资源指标预计算,counterTogauge 简化最终查看的表达式
  • 容器基础资源/k8s资源指标和夜莺树绑定

新模式:部署在k8s中采集相关指标

原理说明

  • 通过抓取各个组件的/metrics接口获得prometheus的数据,ETL后push到夜莺
  • 各个采集项目有metirc和tag的白名单过滤
  • cadvisor数据需要hold点做计算比率类指标,多用在百分比的情况,其余不需要
  • counter类型将有夜莺agent转换为gauge型,即数值已经转为rate了 ,所有counter类型metric_name 加_rate后缀
  • 指标说明在metrics-detail文件夹里
  • k8s yaml配置在 k8s-config
  • 服务组件监控时多实例问题:用户无需关心,k8s-mon自动发现并采集
  • 采集每node上的kube-proxy kubelet-node指标时支持并发数配置和静态分片
  • 服务组件采集时会添加func_name标签作为区分具体组件任务,类似prometheusjob标签
  • 基础指标添加node_ip ,node_name作为宿主机标识标签
  • ksm指标没有nid的默认上报到服务节点server_side_nid ,例如kube_node_status_allocatable_cpu_cores这种共享指标
  • 服务组件采集预聚合了一些指标,包括 分位值、平均值、成功率,对应文档在 metrics-detail/preaggregation.md
  • 服务组件采集了对应golang 进程的指标 包括 内存、goroutine等 ,对应文档在 metrics-detail/process-resource.md

采集内容说明

  • 一般来说在k8s集群汇总我们关注一下4类指标
指标类型 采集源 应用举例 部署方式
容器基础资源指标 kubelet 内置cadvisor 查看容器cpu、mem等 k8s daemonset
k8s资源指标 kube-stats-metrics (简称ksm) 查看pod状态、查看deployment信息等 k8s deployment (需要提前部署ksm)
k8s服务组件指标 各个服务组件的metrics接口(多实例自动发现)
apiserver
kube-controller-manager
kube-scheduler
etcd
coredns
kube-proxy
kubelet-node
查看请求延迟/QPS等 和ksm同一套代码,部署在 k8s deployment
业务指标(暂不支持) pod暴露的metrics接口 - -

采集地址配置/发现说明

  • 每种项目配置了相关配置段才会开启,如果不想采集某类指标可以去掉其配置
  • 每种项目由 user_specified 配置是否采用用户指定的地址,用来处理有些服务组件以裸进程形式部署无法从内部发现的case
  • user_specified:true时,对应的addrs为采集地址url列表
  • user_specified:false时,则认为由内置的代码来进行动态发现,需要配置好对应的port schema metrics_path等信息
采集类型 采集地址说明 配置/发现说明
容器基础资源指标 kubelet-cadvisor kubelet 在node上listen分两种情况:
listen 0.0.0.0
listen机器内网ip
默认为k8s-mon自动根据配置的port找到对应的地址
k8s资源指标 kube-stats-metrics 默认为通过coredns 访问service http://kube-state-metrics.kube-system:8080/metrics 同时支持指定
k8s服务组件指标(master侧)
apiserver
kube-controller-manager
kube-scheduler
etcd
coredns
需要注意这些组件的部署方式 :
部署在pod 中
以裸进程部署
k8s-mon默认认这些组件部署在pod中,通过getpod获取地址列表
k8s服务组件指标(每node部署) kube-proxy
kubelet-node
需要注意这些组件的部署方式 :
部署在pod 中
以裸进程部署
k8s-mon默认认这些组件在每个node都可以以ip:port/metrics访问到,通过getnode获取internal ip ,对应的服务需要listen 内网ip或0.0.0.0
业务指标(暂不支持) pod暴露的metrics接口 -

使用指南

3、安装步骤

setup01 准备工作

准备k8s环境 ,确保每个node节点部署夜莺agent n9e-agent

# 创建namespace kube-admin
kubectl create ns kube-admin
# 创建访问etcd所需secret,在master上执行(不采集etcd则不需要)
kubectl create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt -n kube-admin

直接使用公共源的镜像

# 公共源使用阿里云的
# registry.cn-beijing.aliyuncs.com/n9e/k8s-mon:v1

或者自己下载代码,打镜像

mkdir -pv $GOPATH/github.com/n9e 
cd $GOPATH/github.com/n9e 
git clone https://github.com/n9e/k8s-mon 

# 使用docker 命令,或者ci工具,将镜像同步到仓库中 
# 如需修改镜像名字,需要同步修改daemonset 和deployment yaml文件中的image字段
# 镜像需要同步到所有node,最好上传到仓库中
cd k8s-mon  && docker build -t k8s-mon:v1 . 

setup02 必须修改的配置

修改对接夜莺nid标签的名字

  • 对应配置为配置文件中的n9e_nid_label_name
  • 默认为:N9E_NID,与之前k8s-mon采集cadvisor指标要求容器环境变量名一致
  • 如需修改则需要改 k8s-config/configMap_deployment.yamlk8s-config/configMap_daemonset.yaml 中的 n9e_nid_label_name字段

pod yaml文件中传入上述 nid标签,例如:N9E_NID

  • 举例:deployment中定义pod的 N9E_NID label,假设test-server01这个模块对应的服务树节点nid为5
  • 后续该pod的容器的基础指标出现在nid=5的节点下: 如 cpu.user
  • 后续该pod的k8s的基础指标出现在nid=5的节点下: 如 kube_deployment_status_replicas_available
  • 其余自定义标签不采集,如:region: A cluster: B
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-server01-deployment
  labels:
    app: test-server01
    # 这里表示此deployment的nid为5
    N9E_NID: "5"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-server01
  template:
    metadata:
      labels:
        app: test-server01
        region: A
        cluster: B
        # 这里表示此deployment启动的容器nid为5
        N9E_NID: "5"

服务组件监控需要指定server_side_nid

  • 修改 k8s-config/configMap_deployment.yaml 将 server_side_nid: 字段改为指定的服务组件监控叶子节点的nid
  • 举例:server_side_nid: "6":代表6为k8s集群的服务树叶子节点,k8s控制平面的指标都会上报到这里

setup03 可以调整的配置(维持默认值时可跳过此段配置)

如果不想采集某类指标可以去掉其配置

  • 举例:不想采集apiserver的指标
  • 则去掉/注释掉 k8s-config/configMap_deployment.yamlapiserver段即可
  • deployment中需要采集每node的kube-proxykubelet (node量大的时候)不需要可以去掉

每node的kube-proxykubelet静态分片采集

  • 默认采集所有node的指标,在node数量大时会导致性能问题,则需要开启分片采集
  • 举例有1万个node需要采集kube-proxy,则部署3个k8s-mon,配置值开启kube-proxy段
  • 其中hash_mod_num代表总分片数量 hash_mod_shard代表本实例取模后的index(取值范围是0 ~ hash_mod_num-1)
  • 那么这三个实例则会将1万个node分片采集
# 实例1
kube_proxy:
  hash_mod_num: 3
  hash_mod_shard: 0
# 实例2
kube_proxy:
  hash_mod_num: 3
  hash_mod_shard: 1
# 实例3
kube_proxy:
  hash_mod_num: 3
  hash_mod_shard: 2

想给某个采集项指定采集地址

  • 举例:想设置kube-scheduler的采集地址为 https://1.1.1.1:1234/metricshttps://2.2.2.2:1234/metrics
  • 则修改k8s-config/configMap_deployment.yamluser_specifiedaddrs即可
kube_scheduler:
  user_specified: true
  addrs:
    - "https://1.1.1.1:1234/metrics"
    - "https://2.2.2.2:1234/metrics"

如需给采集的指标添加自定义tag

  • 则修改 k8s-config/configMap_deployment.yaml k8s-config/configMap_daemonset.yaml中的append_tags字段即可
append_tags:
  key1: value1
  key2: value2

如需修改采集间隔

  • 修改k8s-config/configMap_deployment.yaml k8s-config/configMap_daemonset.yaml中的 collect_step 字段

如需修改某个项目的采集并发

  • 修改k8s-config/configMap_deployment.yaml 中的指定项目的 concurrency_limit 字段,默认10

如需服务组件采集多实例时的特征标签

  • 修改k8s-config/configMap_deployment.yaml 中的 multi_server_instance_unique_label 字段

setup04 启动服务

启动ksm服务(部署在kube-system namespace中 ,需要采集才启动)

kubectl apply -f k8s-config/kube-stats-metrics

启动k8s-mon daemonset 和deployment (部署在kube-admin namespace中,按需启动daemonset 和deployment)

kubectl apply -f k8s-config

setup05 观察日志,查看指标

查看日志

kubectl logs -l app=k8s-mon-deployment  -n kube-admin  -f
kubectl logs -l app=k8s-mon-daemonset  -n kube-admin  -f

setup06 查看指标,导入大盘图

即时看图查看指标

# 浏览器访问及时看图path: http:///mon/dashboard?nid=
 

导入大盘图

# 大盘图在 metrics-detail/夜莺大盘-xxxjson中
# 将三个大盘json文件放到夜莺服务端机器 /etc/screen 下 
# 或者 克隆夜莺3.5+代码,内置大盘图json在 etc/screen 下 
# 刷新页面,在对应的节点选择导入内置大盘即可

注意事项

采集间隔

  • kubelet 内置了cadvisor作为容器采集 ,具体文档可以看这里 cadvisor housekeeping配置
  • 同时kubelet 命令行透传了相关配置
  • --housekeeping-interval duration Default: `10s` 模式采集是10秒,所以在默认配置下无论prometheus还是k8s-mon采集间隔不应低于10s
  • cpu 和mem指标需要pod设置limit,如果没有limit则某些指标会缺失

白名单问题

  • 建议保持默认的metrics名单,metrics_white_list为空则全采集
  • tag白名单可按需配置

histogram数据问题

  • 提供基于 histogram的分位值,所有的_bucket指标已经被过滤掉了,提供分位值quantile 50 90 95 99
  • 线性插值法计算,和prometheus大致相同 :prometheus先算rate再算sum,k8s-mon是先算sum再算rate
  • 举例 coredns_dns_request_duration_seconds_bucket --> coredns_dns_request_duration_seconds_quantile 代表coredns 解析平均延迟分位值
  • 同时提供平均值 举例 coredns_dns_request_duration_seconds_bucket -->coredns_dns_request_duration_seconds_avg

原有cadvisor采集模式,即配置文件中collect_mode : cadvisor_plugin

作为Nightingale的插件,用于收集docker容器的监控指标

快速构建

    $ mkdir -p $GOPATH/src/github.com/n9e
    $ cd $GOPATH/src/github.com/n9e
    $ git clone https://github.com/n9e/k8s-mon.git
    $ cd k8s-mon
    $ make
    $ ./k8s-mon

前置依赖

  1. docker容器所在宿主机已安装并启动了cadvisor
  2. docker容器的环境变量中包含 N9E_NID ,N9E_NID 的内容为夜莺服务树节点id,如果设置 N9E_NID = 1,则到节点id为1的节点下,就可以容器的监控指标

使用方式

  1. 将 k8s-mon、k8s-mon.yml 分发到容器所在的宿主机上
  2. 到宿主机所属节点配置插件采集

k8s-mon

  1. 配置完之后,到即时看图,选择对应的节点,再选择设备无关,即可查看采集到的容器监控指标 docker-metric

视频教程

观看地址

指标列表

  • CPU
    cpu.user
    cpu.sys
    cpu.idle
    cpu.util
    cpu.periods
    cpu.throttled_periods
    cpu.throttled_time

  • 内存
    mem.bytes.total
    mem.bytes.used
    mem.bytes.used.percent
    mem.bytes.cached
    mem.bytes.rss
    mem.bytes.swap

  • 磁盘
    disk.io.read.bytes
    disk.io.write.bytes
    disk.bytes.total
    disk.bytes.used
    disk.bytes.used.percent

  • 网络
    net.sockets.tcp.timewait
    net.in.bits
    net.in.pps
    net.in.errs
    net.in.dropped
    net.out.bits
    net.out.pps
    net.out.errs
    net.out.dropped
    net.tcp.established

  • 系统
    sys.ps.process.used
    sys.ps.thread.used
    sys.fd.count.used
    sys.socket.count.used
    sys.restart.count

Issues
  • api-server等无数据流入

    api-server等无数据流入

    操作步骤均按照ReadMe

    #查找日志如下: [[email protected] k8s-mon]# kubectl logs -l app=k8s-mon-deployment -n kube-admin -f level=debug ts=2021-06-02T15:34:22.587+08:00 caller=common.go:493 msg=DoCollectSuccessfullyReadyToPush funcName=kubelet_node seq=2/3 metrics_num=99 time_took_seconds=0.025667392 level=debug ts=2021-06-02T15:34:22.594+08:00 caller=common.go:493 msg=DoCollectSuccessfullyReadyToPush funcName=kubelet_node seq=3/3 metrics_num=101 time_took_seconds=0.033676828 level=debug ts=2021-06-02T15:34:22.600+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=kubelet_node url=http://localhost:2080/api/collector/push metricsNum=101 time_took_seconds=0.006431675 level=debug ts=2021-06-02T15:34:22.600+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=coredns url=http://localhost:2080/api/collector/push metricsNum=44 time_took_seconds=0.032007075 level=debug ts=2021-06-02T15:34:22.602+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=kubelet_node url=http://localhost:2080/api/collector/push metricsNum=99 time_took_seconds=0.013030715 level=debug ts=2021-06-02T15:34:22.630+08:00 caller=common.go:493 msg=DoCollectSuccessfullyReadyToPush funcName=kubelet_node seq=1/3 metrics_num=128 time_took_seconds=0.068968997 level=debug ts=2021-06-02T15:34:22.638+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=kubelet_node url=http://localhost:2080/api/collector/push metricsNum=128 time_took_seconds=0.007533717 level=debug ts=2021-06-02T15:34:22.647+08:00 caller=kube_state_metrics.go:32 msg=DoKubeStatsMetricsCollectCurlTlsMetricsApiRes resNum=4186 level=debug ts=2021-06-02T15:34:22.652+08:00 caller=kube_state_metrics.go:244 msg=DoCollectSuccessfullyReadyToPush funcName=kube-stats-metrics metrics_num=4216 time_took_seconds=0.092715748 metric_addr=http://kube-state-metrics.kube-system:8080/metrics level=debug ts=2021-06-02T15:34:22.774+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=kube-stats-metrics url=http://localhost:2080/api/collector/push metricsNum=4216 time_took_seconds=0.122264488 level=debug ts=2021-06-02T15:34:52.559+08:00 caller=common.go:459 msg=GetServerAddrByGetPodorNode funcName=coredns num=2 level=error ts=2021-06-02T15:34:52.559+08:00 caller=common.go:450 msg=GetServerAddrByGetPodErrorNoValue funcName:=kube-scheduler level=error ts=2021-06-02T15:34:52.559+08:00 caller=kube_scheduler.go:21 msg=GetServerSideAddrEmpty funcName:=kube-scheduler level=error ts=2021-06-02T15:34:52.559+08:00 caller=common.go:450 msg=GetServerAddrByGetPodErrorNoValue funcName:=kube-controller-manager level=error ts=2021-06-02T15:34:52.559+08:00 caller=kube_controller_manager.go:19 msg=GetServerSideAddrEmpty funcName:=kube-controller-manager level=debug ts=2021-06-02T15:34:52.560+08:00 caller=common.go:459 msg=GetServerAddrByGetPodorNode funcName=kubelet_node num=3 level=error ts=2021-06-02T15:34:52.561+08:00 caller=common.go:450 msg=GetServerAddrByGetPodErrorNoValue funcName:=api-server level=error ts=2021-06-02T15:34:52.561+08:00 caller=kube_apiserver.go:27 msg=GetServerSideAddrEmpty funcName:=api-server level=error ts=2021-06-02T15:34:52.561+08:00 caller=common.go:450 msg=GetServerAddrByGetPodErrorNoValue funcName:=etcd level=error ts=2021-06-02T15:34:52.561+08:00 caller=kube_etcd.go:21 msg=GetServerSideAddrEmpty funcName:=etcd level=error ts=2021-06-02T15:34:52.562+08:00 caller=common.go:450 msg=GetServerAddrByGetPodErrorNoValue funcName:=kube-proxy level=error ts=2021-06-02T15:34:52.562+08:00 caller=common.go:547 msg=GetServerSideAddrEmpty funcName:=kube-proxy

    opened by Onfanta 2
  • 按readme步骤部署,日志报错,k8s控制平台无数据

    按readme步骤部署,日志报错,k8s控制平台无数据

    env: kubernetes-v1.15.4 dokcer 20.10.2 n9e:3.5.1

    分别执行以下步骤: kubectl apply -f k8s-config/kube-stats-metrics kubectl apply -f k8s-config

    k8s-mon-deployment 抛出错日志:

    level=error ts=2021-02-23T16:35:46.274+08:00 caller=common.go:437 msg=GetServerAddrByGetPodErrorNoValue funcName:=coredns level=error ts=2021-02-23T16:35:46.274+08:00 caller=kube_coredns.go:21 msg=GetServerSideAddrEmpty funcName:=coredns level=error ts=2021-02-23T16:35:46.274+08:00 caller=common.go:437 msg=GetServerAddrByGetPodErrorNoValue funcName:=etcd level=error ts=2021-02-23T16:35:46.274+08:00 caller=kube_etcd.go:21 msg=GetServerSideAddrEmpty funcName:=etcd level=error ts=2021-02-23T16:35:46.274+08:00 caller=common.go:437 msg=GetServerAddrByGetPodErrorNoValue funcName:=kube-scheduler level=error ts=2021-02-23T16:35:46.274+08:00 caller=kube_scheduler.go:21 msg=GetServerSideAddrEmpty funcName:=kube-scheduler level=error ts=2021-02-23T16:35:46.274+08:00 caller=common.go:437 msg=GetServerAddrByGetPodErrorNoValue funcName:=kube-controller-manager level=error ts=2021-02-23T16:35:46.274+08:00 caller=kube_controller_manager.go:19 msg=GetServerSideAddrEmpty funcName:=kube-controller-manager level=error ts=2021-02-23T16:35:46.274+08:00 caller=common.go:437 msg=GetServerAddrByGetPodErrorNoValue funcName:=api-server level=error ts=2021-02-23T16:35:46.275+08:00 caller=kube_apiserver.go:27 msg=GetServerSideAddrEmpty funcName:=api-server level=error ts=2021-02-23T16:35:46.275+08:00 caller=common.go:437 msg=GetServerAddrByGetPodErrorNoValue funcName:=kube-proxy level=error ts=2021-02-23T16:35:46.275+08:00 caller=common.go:534 msg=GetServerSideAddrEmpty funcName:=kube-proxy level=error ts=2021-02-23T16:35:46.283+08:00 caller=common.go:463 msg=CurlTlsMetricsResError func_name=kubelet_node err:="reading text format failed: text format parsing error in line 1: unexpected end of input stream" seq=1/5 addr=https://10.129.0.122:10250/metrics level=error ts=2021-02-23T16:35:46.284+08:00 caller=common.go:463 msg=CurlTlsMetricsResError func_name=kubelet_node err:="reading text format failed: text format parsing error in line 1: unexpected end of input stream" seq=4/5 addr=https://10.129.0.151:10250/metrics level=error ts=2021-02-23T16:35:46.284+08:00 caller=common.go:463 msg=CurlTlsMetricsResError func_name=kubelet_node err:="reading text format failed: text format parsing error in line 1: unexpected end of input stream" seq=5/5 addr=https://10.129.0.152:10250/metrics level=error ts=2021-02-23T16:35:46.284+08:00 caller=common.go:463 msg=CurlTlsMetricsResError func_name=kubelet_node err:="reading text format failed: text format parsing error in line 1: unexpected end of input stream" seq=2/5 addr=https://10.129.0.146:10250/metrics level=error ts=2021-02-23T16:35:46.285+08:00 caller=common.go:463 msg=CurlTlsMetricsResError func_name=kubelet_node err:="reading text format failed: text format parsing error in line 1: unexpected end of input stream" seq=3/5 addr=https://10.129.0.148:10250/metrics level=info ts=2021-02-23T16:35:46.291+08:00 caller=get_pod.go:99 msg=server_pod_ips_result num_kubeSchedulerIps=0 num_kubeControllerIps=0 num_apiServerIps=0 num_coreDnsIps=0 num_kubeProxyIps=0 num_etcdIps=0 time_took_seconds=0.01653417

    opened by vnline 0
Releases(v2.2.0)
  • v2.2.0(Apr 13, 2021)

    2021-04-13

    • [BUGFIX] 解决ksm pod数量等预聚合指标 tag匹配错误bug,现象是相关曲线如cpu限制核数跳动变化
    • [CHANGE] 修改deployment配置为默认不采集etcd,如需采集,请将 deployment中挂载证书那几行注释去掉
    • [CHANGE] 容器版本调整为 v2.2.0
    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Apr 9, 2021)

    v2.1.0 / 2021-04-09

    • [FEATURE] k8s 1.20后续版本默认容器运行时采用containerd,k8s-mon获取容器tag时需要适配,默认采用docker-api,失败再尝试containerd-api
    • [CHANGE] pod runner改为yauritux/busybox-curl 提供curl命令方便排查问题
    • [CHANGE] 注意如果 不采集etcd,没有创建对应的证书(如k8s使用公有云托管的),那么请将 deployment中挂载证书那几行注释掉,不然容器起不来
    • [CHANGE] 容器版本调整为 v2.1.0
    Source code(tar.gz)
    Source code(zip)
  • v2.0.7(Mar 30, 2021)

    • [BUGFIX] hold点/预聚合所使用的共享mapdataMap.Map改为go-cache ,用来做gc,避免pod滚动后旧的数据没有删除导致内存不回收
    • [CHANGE] 编译时传入version,便于打印版本信息
    Source code(tar.gz)
    Source code(zip)
  • v2.0.6(Feb 24, 2021)

    • [CHANGE] 多实例采集时,最终0结果改为不push
    • [CHANGE] 把一些常规info日志降级成debug,--log.level=debug可以调整日志级别
    Source code(tar.gz)
    Source code(zip)
  • v2.0.5(Feb 24, 2021)

  • v2.0.4(Jan 28, 2021)

    • [FEATURE] 新增ksm指标计算节点cpu/mem 请求/限制率等指标
    • [BUGFIX] ksm启动不再sleep等待,因为push的瓶颈在transfer已经解决了
    Source code(tar.gz)
    Source code(zip)
  • v2.0.3(Jan 27, 2021)

  • v2.0.2(Jan 26, 2021)

    • [FEATURE] 新增服务组件histogram计算分位值
    • [FEATURE] 新增计算avg 和成功率
    • [FEATURE] 新增etcd采集
    • [FEATURE] 新增golang进程指标采集
    • [FEATURE] preaggregation.md 作为预聚合指标文档
    • [ENHANCEMENT] 所有counter类型metric_name 加_rate后缀,加以区分
    • [CHANGE] 完善readme和大盘
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jan 20, 2021)

    • [BUGFIX] cpu 和mem指标需要pod设置limit,如果没有limit则某些指标会缺失
    • [BUGFIX] daemonset默认设置limit
    • [BUGFIX] GetPortListenAddr获取内网ip时没有及时close导致 fd会泄露
    • [BUGFIX] GetPortListenAddr改为ticker运行前获取一次传入
    • [BUGFIX] 修复/var/run/docker.sock泄露问题
    • [FEATURE] 基础指标添加node_ip node_name作为宿主机标识标签
    • [ENHANCEMENT] ksm指标没有nid的默认上报到服务节点server_side_nid ,例如kube_node_status_allocatable_cpu_cores这种共享指标
    • [CHANGE] import package fmt
    • [CHANGE] 修改readme
    Source code(tar.gz)
    Source code(zip)
  • v1.0(Jan 19, 2021)

    • [FEATURE] 采集每node上的kube-proxy kubelet-node指标时支持并发数配置和静态分片
    • [FEATURE] 服务组件每个项目支持用户指定地址来覆盖getpod,getnode获取不到地址的case
    • [CHANGE] 每种项目配置了相关配置段才会开启,如果不想采集某类指标可以去掉其配置
    Source code(tar.gz)
    Source code(zip)
Owner
欢迎大家将夜莺周边项目提交到这个group
null
Monitor & detect crashes in your Kubernetes(K8s) cluster

kwatch kwatch helps you monitor all changes in your Kubernetes(K8s) cluster, detects crashes in your running apps in realtime, and publishes notificat

Abdelrahman Ahmed 627 Jun 21, 2022
MySQL Monitor Script

README.md Introduction mymon(MySQL-Monitor) 是Open-Falcon用来监控MySQL数据库运行状态的一个插件,采集包括global status, global variables, slave status以及innodb status等MySQL运行

Open-Falcon 260 May 31, 2022
Open Source Supreme Monitor Based on GoLang

Open Source Supreme Monitor Based on GoLang A module built for personal use but ended up being worthy to have it open sourced.

SneakyKiwi 19 May 6, 2022
Monitor your network and internet speed with Docker & Prometheus

Stand-up a Docker Prometheus stack containing Prometheus, Grafana with blackbox-exporter, and speedtest-exporter to collect and graph home Internet reliability and throughput.

Jeff Geerling 1.2k Jun 23, 2022
SigNoz helps developer monitor applications and troubleshoot problems in their deployed applications

SigNoz helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. ?? ??

SigNoz 7k Jun 22, 2022
Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Hamed Yousefi 21 Jun 14, 2022
Hidra is a tool to monitor all of your services without making a mess.

hidra Don't lose your mind monitoring your services. Hidra lends you its head. ICMP If you want to use ICMP scenario, you should activate on your syst

null 7 Jun 14, 2022
SigNoz helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool

Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Documentatio

SigNoz 4.7k Sep 24, 2021
Go Huobi Market Price Data Monitor

火币(Huobi)价格监控 由于部分交易对火币官方未提供价格监控,因此写了个小程序,长期屯币党可以用它来提醒各种现货价格。 该工具只需要提前安装Go环境和Redis即可。 消息推送使用的「钉钉」,需要提前配置好钉钉机器人(企业群类型、带webhook的机器人)。 使用方法 下载本项目 拷贝根目录下

ROC 4 Jun 18, 2022
Productivity analytics monitor 🧮

Productivity analytics monitor ??

John Forstmeier 0 Oct 8, 2021
Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system.

Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system. Cloudprobe

null 151 Jun 21, 2022
Monitor a process and trigger a notification.

noti Monitor a process and trigger a notification. Never sit and wait for some long-running process to finish. Noti can alert you when it's done. You

Jaime Piña 3.9k Jun 29, 2022
Monitor the performance of your Ethereum 2.0 staking pool.

eth-pools-metrics Monitor the performance of your Ethereum 2.0 staking pool. Just input the withdrawal credentials that were used in the deposit contr

Alvaro 7 May 24, 2022
Gomon - Go language based system monitor

Copyright © 2021 The Gomon Project. Welcome to Gomon, the Go language based system monitor Welcome to Gomon, the Go language based system monitor Over

zosmac 2 May 17, 2022
Fast, zero config web endpoint change monitor

web monitor fast, zero config web endpoint change monitor. for comparing responses, a selected list of http headers and the full response body is stor

Robin Verton 41 Jun 21, 2022
Monitor pipe progress via output to standard error.

Pipe Monitor Monitor pipe progress via output to standard error. Similar to functionality provided by the Pipe Viewer (pv) command, except this comman

SoftCoil Development 5 May 2, 2022
Simple and extensible monitoring agent / library for Kubernetes: https://gravitational.com/blog/monitoring_kubernetes_satellite/

Satellite Satellite is an agent written in Go for collecting health information in a kubernetes cluster. It is both a library and an application. As a

Teleport 193 Mar 26, 2022
This POC is built with the goal to collect events/logs from the host systems such as Kubernetes, Docker, VMs, etc. A buffering layer is added to buffer events from the collector

What is does This POC is build with the goal to collect events/logs from the host systems such as Kubernetes, docker, VMs etc. A buffering layer is ad

Gufran  Mirza 3 Dec 16, 2021
Crit: a command-line tool for bootstrapping Kubernetes clusters

Crit is a command-line tool for bootstrapping Kubernetes clusters. It handles the initial configuration of Kubernetes control plane components, and ad

Chris Marshall 2 Jun 9, 2022