Large-scale Kubernetes cluster diagnostic tool.

Overview

English | 简体中文

KubeProber

What is KubeProber?

KubeProber is a diagnostic tool designed for large-scale Kubernetes clusters. It is used to perform diagnostic items in the kubernetes cluster to prove that the functions of the cluster are normal, KubeProber has the following characteristics:

  • Large-scale clusters support Supports multi-cluster management, supports configuring the relationship between clusters and diagnostic items on the management side and viewing the diagnostic results of all clusters in a unified manner;
  • Cloud Native The core logic is implemented by operator, providing complete Kubernetes API compatibility;
  • Extensible Support user-defined diagnostic items

Different from the monitoring system, KubeProber proves the functions of the cluster are normal from the perspective of diagnostic. Monitoring is a forward link and cannot cover all scenarios in the system. The monitoring data of each environment in the system is normal and cannot prove the system is 100% normal, so a tool is needed to prove the availability of the system from the reverse, and fundamentally to discover unavailable points in the cluster before users, such as:

  • Whether all nodes in the set can be scheduled, whether there are special taints, etc;
  • Whether the pod can be normally created, destroyed, and verified the entire link from kubernetes, kubelet to docker;
  • Create a service and test unicom to verify whether the kube-proxy link is normal;
  • Resolve an internal or external domain name to verify whether CoreDNS is working properly;
  • Visit an ingress domain name to verify whether the ingress component in the cluster is working properly;
  • Create and delete a namespace to verify whether the related webhook is working properly;
  • Perform operations such as put/get/delete on Etcd to verify whether Etcd is running normally;
  • Verify the normal operation of MySQL through the operation of mysql-client;
  • Simulate users to log in and operate the business system to verify whether the main business process is frequent;
  • Check whether the certificates of each environment have expired;
  • Expiration check of cloud resources;
  • ... more!

Architecture

Kubeprober Architecture

probe-master

The operator running on the management cluster. This operator maintains two CRDs, one is Cluster, which is used to manage the managed cluster, and the other is Probe, which is used to manage the built-in and user-written diagnostic items, probe-master Through watch these two CRDs, the latest diagnostic configuration is pushed to the managed cluster, and probe-master provides an interface for viewing the diagnosis results of the managed cluster.

probe-agent

The operator running on the managed cluster. This operator maintains two CRDs. One is a Probe that is exactly the same as the probe-master. The probe-agent executes the cluster’s diagnostic items according to the definition of the probe. The other is ProbeStatus for Record the diagnosis results of each Probe. Users can view the diagnosis results of the cluster through kubectl get probestatus in the managed cluster.

Getting started

Installation

Both the master and agent of kubeprober run as controllers in kubernetes. Before installation, make sure that you have deployed the kubernetes cluster and can access it using kubectl.

Deploy probe-master:

The operation of WebHook needs to verify the certificate, and you need to deploy the cert-manager service first:

kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.3.1/cert-manager.yaml

install probe-master:

APP=probe-master make deploy

Deploy probe-agent:

Before deploying the agent, make sure that you have created a cluster in the master side:

kubectl apply -f config/samples/kubeprobe_v1_cluster.yaml
kubectl get cluster

Modify the configmap configuration after creating the cluster:

vim config/manager-probe-agent/manager.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: probeagent
  namespace: system
data:
  probe-conf.yaml: |
    probe_master_addr: http://kubeprober-probe-master.kubeprober.svc.cluster.local:8088
    cluster_name: moon
    secret_key: 2f5079a5-425c-4fb7-8518-562e1685c9b4

If only probe-agent need (e.g debug/developing or just running probe cases in one k8s cluster), following configurations needed, and probe-agent will stop communication with master.

vim config/manager-probe-agent/manager.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: probeagent
  namespace: system
data:
  probe-conf.yaml: |
    # default disabled, if enabled, probe-agent will stop communication with master
    agent_debug: true
    # default 1, if more verbose info needed, increase it
    debug_level: 1

install probe-agent

APP=probe-agent make deploy

To start developing kubeprober

You can run and build probe-master and probe-agent locally. please make sure that ~/.kube/config can access the kubernetes cluster.

install crd && webhook resources

make dev

run probe-master

APP=probe-master make run

run probe-agent

Before run probe-agent, a cluster crd resource should be created, same as section [Deploy probe-agent]

# create local config yaml file
touch probe-conf.yaml

# input configurations, eg. cluster info
cat << EOF > probe-conf.yaml
probe_master_addr: http://kubeprober-probe-master.kubeprober.svc.cluster.local:8088
cluster_name: moon
secret_key: 2f5079a5-425c-4fb7-8518-562e1685c9b4
EOF

# run probe-agent with config file
APP=probe-agent CONF=./probe-conf.yaml make run

build binary file

APP=probe-master make build
APP=probe-agent make build

build image

# build with default version: latest
# output image format: kubeprober/probe-master:latest
APP=probe-master make docker-build

# build with custom version: v0.0.1
# output image format: kubeprober/probe-master:v0.0.1
APP=probe-master V=v0.0.1 make docker-build

# build with default version: latest
APP=probe-agent make docker-build

# push with default version: latest
APP=probe-agent make docker-push

# build & push
APP=probe-agent make docker-build-push

Write your prober

custom probes

Contributing

Contributions are always welcomed. Please refer to Contributing to KubeProber for details.

Contact Us

If you have any questions, please feel free to contact us.

License

KubeProber is under the Apache 2.0 license. See the LICENSE file for details.

Comments
  • reconciler group

    reconciler group": "kubeprober.erda.cloud", "reconciler kind": "Probe", "name": "probe-test01", "namespace": "kubeprober", "error": "Job.batch \"probe-test01\" is invalid: [spec.template.spec.containers: Required value, spec.template.spec.restartPolicy: Unsupported value: \"Always\": supported values: \"OnFailure\", \"Never\"]"

    probe yaml

    apiVersion: kubeprober.erda.cloud/v1 kind: Probe metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubeprober.erda.cloud/v1","kind":"Probe","metadata":{"annotations":{},"name":"probe-test01","namespace":"kubeprober"},"spec":{"probeList":[{"name":"probe-test01","spec":{"containers":[{"image":"kubeprober/demo-error:v0.0.1","name":"demo-error","resources":{"requests":{"cpu":"10m","memory":"50Mi"}}}],"restartPolicy":"Never"}}]}} creationTimestamp: "2021-08-18T09:24:26Z" generation: 1 name: probe-test01 namespace: kubeprober resourceVersion: "1255475" selfLink: /apis/kubeprober.erda.cloud/v1/namespaces/kubeprober/probes/probe-test01 uid: 1c6a2f23-a68c-4283-9869-0c2ed3e891c7 spec: policy: {} probeList:

    • name: probe-test01 spec: containers:
      • image: kubeprober/demo-error:v0.0.1 name: demo-error resources: requests: cpu: 10m memory: 50Mi restartPolicy: Never status: md5: 0e69f266c6e6d360c7e4130a4b4e6ff4

    probe-agent error image

    kind/bug 
    opened by LeoWenxiang 3
  • standalone案例无法正确运行

    standalone案例无法正确运行

    What happened:

    standalone案例无法正确运行,现象:

    1. probe-agent启动报错找不到configmap-probeagent
    2. probe-agent日志报错各类资源list等权限forbidden
    3. Prober 示例运行job提示无法找到configmap-extra-config

    What you expected to happen:

    standalone案例正确运行

    How to reproduce it (as minimally and precisely as possible):

    1. 按照文档 https://docs.erda.cloud/1.5/manual/eco-tools/kubeprober/best-practices/standalone_kubeprober.html 进行操作,prober-agent容器无法正常启动;
    2. 按照文档 https://docs.erda.cloud/1.5/manual/eco-tools/kubeprober/guides/first_prober.html 运行prober示例无法正确运行job

    Anything else we need to know?:

    已自行定位问题原因并修改: https://github.com/dotDuck/kubeprober/commit/ade3b0de37bf1ce586cc542e7dcbbfee1ef0d8c1#diff-ebef744877d88253a7e2a26f413155959097fe3659045131b32418fb9af80937 如果可以的话可以进行pull-request。

    问题1原因:probe-agent-standalone.yaml声明问题

    • 缺少configmapprobeagent的声明(对应现象1)
    • serviceaccountkubeprober-worker声明后未使用,应该将kubeprober-worker-rolebinding绑定的serviceacccount更换为kubeprober(对应现象2)
    • probe-agent镜像更新到docker-hub中最新版(也可忽略)
    • 仍有其他configmap找不到的日志报错,但是不影响整体运行,如:dice-cluster-info、dice-tools-info、dice-addon-info

    问题2原因:prober-demo-example声明问题

    • 缺少configmapextra-config的声明,但是配置内容无从参考,也找不到合适的新增位置,临时新建cm解决(对应现象3)

    Environment:

    • Erda version: 无
    • Kubernetes version (use kubectl version): Kind kubernetes v1.19.11
    kind/bug 
    opened by dotDuck 2
  • add README.md && modify Makefile to unified the build method

    add README.md && modify Makefile to unified the build method

    What type of this PR /kind feature

    What this PR does / why we need it: probe agent: add README.md && modify Makefile to unified the build method

    Specified Reviewers: /assign @luobily @WeiXuSeu

    opened by sixther-dc 2
  • modify image eviction config

    modify image eviction config

    What type of this PR /kind bug

    What this PR does / why we need it: Which issue(s) this PR fixes:

    modify imagefs eviction config

    Specified Reviewers: /assign @jianfeng Chen

    Need cherry-pick to release versions? /last version

    opened by Zhengjie610 1
  • Update Makefile

    Update Makefile

    What type of this PR

    /kind bug

    What this PR does / why we need it:

    Which issue(s) this PR fixes:

    • Failed to obtain operating system information, resulting in installation kubectl-probe error

    Specified Reviewers:

    /assign @Jianfeng Chen

    Need cherry-pick to release versions?

    /last version

    opened by ai-run 1
  • update grafana image

    update grafana image

    What type of this PR

    Add one of the following kinds: /kind feature update grafana image

    What this PR does / why we need it:

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • support erda alert  statistical analysis

    support erda alert statistical analysis

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    support erda alert statistical analysis

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • add nginx-controller prober

    add nginx-controller prober

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    add nginx-controller prober

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc @WeiXuSeu @luobily

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • kubectl probe plugin add update cpu & memory setting of probe-agent f…

    kubectl probe plugin add update cpu & memory setting of probe-agent f…

    …unction

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    kubectl probe plugin add update cpu & memory setting of probe-agent function

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc @luobily

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • add kubernets resource acount for probe-agent

    add kubernets resource acount for probe-agent

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    add kubernets resource acount for probe-agent

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc @WeiXuSeu @luobily

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • modify storage probe for data dist check

    modify storage probe for data dist check

    What type of this PR

    Add one of the following kinds: /kind bug

    What this PR does / why we need it:

    modify storage probe for data dist check

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc @luobily @WeiXuSeu

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • fix issue #125 单集群使用 Kubeprober案例

    fix issue #125 单集群使用 Kubeprober案例

    What type of this PR

    /kind bug

    What this PR does / why we need it:

    fix for https://docs.erda.cloud/2.0/manual/eco-tools/kubeprober/best-practices/standalone_kubeprober.html

    Which issue(s) this PR fixes:

    • Fixes #125

    Need cherry-pick to release versions?

    depend on repository owner

    opened by dotDuck 0
Releases(v0.1.0)
Owner
Erda
Erda Project
Erda
cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resource objects related of Kubernetes Cluster API.

Overview cluster-api-state-metrics (CASM) is a service that listens to the Kubernetes API server and generates metrics about the state of custom resou

Daimler Group 61 Oct 27, 2022
KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Kubernetes-based Event Driven Autoscaling KEDA allows for fine-grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KED

KEDA 5.9k Jan 7, 2023
kubetnl tunnels TCP connections from within a Kubernetes cluster to a cluster-external endpoint, e.g. to your local machine. (the perfect complement to kubectl port-forward)

kubetnl kubetnl (kube tunnel) is a command line utility to tunnel TCP connections from within a Kubernetes to a cluster-external endpoint, e.g. to you

null 5 Dec 16, 2022
A Terraform module to manage cluster authentication (aws-auth) for an Elastic Kubernetes (EKS) cluster on AWS.

Archive Notice The terraform-aws-modules/eks/aws v.18.20.0 release has brought back support aws-auth configmap! For this reason, I highly encourage us

Aidan Melen 28 Dec 4, 2022
PolarDB Stack is a DBaaS implementation for PolarDB-for-Postgres, as an operator creates and manages PolarDB/PostgreSQL clusters running in Kubernetes. It provides re-construct, failover swtich-over, scale up/out, high-available capabilities for each clusters.

PolarDB Stack开源版生命周期 1 系统概述 PolarDB是阿里云自研的云原生关系型数据库,采用了基于Shared-Storage的存储计算分离架构。数据库由传统的Share-Nothing,转变成了Shared-Storage架构。由原来的N份计算+N份存储,转变成了N份计算+1份存储

null 23 Nov 8, 2022
Manage large fleets of Kubernetes clusters

Introduction Fleet is GitOps at scale. Fleet is designed to manage up to a million clusters. It's also lightweight enough that it works great for a si

Rancher 1.2k Dec 31, 2022
vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds

Website • Quickstart • Documentation • Blog • Twitter • Slack vcluster - Virtual Clusters For Kubernetes Lightweight & Low-Overhead - Based on k3s, bu

Loft Labs 2.3k Jan 4, 2023
PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes.

GalaxyKube -- PolarDB-X Operator PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes. It follo

null 64 Dec 19, 2022
kitex running in kubernetes cluster and discover each other in kubernetes Service way

Using kitex in kubernetes Kitex [kaɪt'eks] is a high-performance and strong-extensibility Golang RPC framework. This go module helps you to build mult

adolli 1 Feb 21, 2022
Go-gke-pulumi - A simple example that deploys a GKE cluster and an application to the cluster using pulumi

This example deploys a Google Cloud Platform (GCP) Google Kubernetes Engine (GKE) cluster and an application to it

Snigdha Sambit Aryakumar 1 Jan 25, 2022
Influxdb-cluster - InfluxDB Cluster for replacing InfluxDB Enterprise

InfluxDB ATTENTION: Around January 11th, 2019, master on this repository will be

Shiwen Cheng 524 Dec 26, 2022
Open Source runtime tool which help to detect malware code execution and run time mis-configuration change on a kubernetes cluster

Kube-Knark Project Trace your kubernetes runtime !! Kube-Knark is an open source tracer uses pcap & ebpf technology to perform runtime tracing on a de

Chen Keinan 32 Sep 19, 2022
Kubesecret is a command-line tool that prints secrets and configmaps data of a kubernetes cluster.

Kubesecret Kubesecret is a command-line tool that prints secrets and configmaps data of a kubernetes cluster. kubesecret -h for help pages. Install go

Charalampos Mitrodimas 18 May 3, 2022
gpupod is a tool to list and watch GPU pod in the kubernetes cluster.

gpupod gpupod is simple tool to list and watch GPU pod in kubernetes cluster. usage Usage: gpupod [flags] Flags: -t, --createdTime with pod c

null 0 Dec 8, 2021
Dominik Robert 0 Jan 4, 2022
A distributed append only commit log used for quick writes and reads to any scale

Maestro-DB A distributed append only commit log used for quick writes and reads to any scale Part 1 - Scaffolding Part-1 Notes Going to start off with

null 0 Nov 28, 2021
Planet Scale Robotics - Offload computation-heavy robotic operations to GPU powered world's first cloud-native robotics platform.

robolaunch ?? Planet Scale Robotics - Offload computation-heavy robotic operations to GPU powered world's first cloud-native robotics platform. robola

robolaunch 27 Jan 1, 2023
Kube-step-podautoscaler - Controller to scale workloads based on steps

Refer controller/*controller.go for implementation details and explanation for a better understanding.

Danish Prakash 5 Sep 5, 2022
Linux provisioning scripts + application deployment tools. Suitable for self-hosting and hobby-scale application deployments.

Apollo Linux provisioning scripts + application deployment tools. Suitable for self-hosting and hobby-scale application deployments. Philosophy Linux-

K T Corp. 1 Feb 7, 2022