Large-scale Kubernetes cluster diagnostic tool.

Overview

English | 简体中文

KubeProber

What is KubeProber?

KubeProber is a diagnostic tool designed for large-scale Kubernetes clusters. It is used to perform diagnostic items in the kubernetes cluster to prove that the functions of the cluster are normal, KubeProber has the following characteristics:

  • Large-scale clusters support Supports multi-cluster management, supports configuring the relationship between clusters and diagnostic items on the management side and viewing the diagnostic results of all clusters in a unified manner;
  • Cloud Native The core logic is implemented by operator, providing complete Kubernetes API compatibility;
  • Extensible Support user-defined diagnostic items

Different from the monitoring system, KubeProber proves the functions of the cluster are normal from the perspective of diagnostic. Monitoring is a forward link and cannot cover all scenarios in the system. The monitoring data of each environment in the system is normal and cannot prove the system is 100% normal, so a tool is needed to prove the availability of the system from the reverse, and fundamentally to discover unavailable points in the cluster before users, such as:

  • Whether all nodes in the set can be scheduled, whether there are special taints, etc;
  • Whether the pod can be normally created, destroyed, and verified the entire link from kubernetes, kubelet to docker;
  • Create a service and test unicom to verify whether the kube-proxy link is normal;
  • Resolve an internal or external domain name to verify whether CoreDNS is working properly;
  • Visit an ingress domain name to verify whether the ingress component in the cluster is working properly;
  • Create and delete a namespace to verify whether the related webhook is working properly;
  • Perform operations such as put/get/delete on Etcd to verify whether Etcd is running normally;
  • Verify the normal operation of MySQL through the operation of mysql-client;
  • Simulate users to log in and operate the business system to verify whether the main business process is frequent;
  • Check whether the certificates of each environment have expired;
  • Expiration check of cloud resources;
  • ... more!

Architecture

Kubeprober Architecture

probe-master

The operator running on the management cluster. This operator maintains two CRDs, one is Cluster, which is used to manage the managed cluster, and the other is Probe, which is used to manage the built-in and user-written diagnostic items, probe-master Through watch these two CRDs, the latest diagnostic configuration is pushed to the managed cluster, and probe-master provides an interface for viewing the diagnosis results of the managed cluster.

probe-agent

The operator running on the managed cluster. This operator maintains two CRDs. One is a Probe that is exactly the same as the probe-master. The probe-agent executes the cluster’s diagnostic items according to the definition of the probe. The other is ProbeStatus for Record the diagnosis results of each Probe. Users can view the diagnosis results of the cluster through kubectl get probestatus in the managed cluster.

Getting started

Installation

Both the master and agent of kubeprober run as controllers in kubernetes. Before installation, make sure that you have deployed the kubernetes cluster and can access it using kubectl.

Deploy probe-master:

The operation of WebHook needs to verify the certificate, and you need to deploy the cert-manager service first:

kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.3.1/cert-manager.yaml

install probe-master:

APP=probe-master make deploy

Deploy probe-agent:

Before deploying the agent, make sure that you have created a cluster in the master side:

kubectl apply -f config/samples/kubeprobe_v1_cluster.yaml
kubectl get cluster

Modify the configmap configuration after creating the cluster:

vim config/manager-probe-agent/manager.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: probeagent
  namespace: system
data:
  probe-conf.yaml: |
    probe_master_addr: http://kubeprober-probe-master.kubeprober.svc.cluster.local:8088
    cluster_name: moon
    secret_key: 2f5079a5-425c-4fb7-8518-562e1685c9b4

If only probe-agent need (e.g debug/developing or just running probe cases in one k8s cluster), following configurations needed, and probe-agent will stop communication with master.

vim config/manager-probe-agent/manager.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: probeagent
  namespace: system
data:
  probe-conf.yaml: |
    # default disabled, if enabled, probe-agent will stop communication with master
    agent_debug: true
    # default 1, if more verbose info needed, increase it
    debug_level: 1

install probe-agent

APP=probe-agent make deploy

To start developing kubeprober

You can run and build probe-master and probe-agent locally. please make sure that ~/.kube/config can access the kubernetes cluster.

install crd && webhook resources

make dev

run probe-master

APP=probe-master make run

run probe-agent

Before run probe-agent, a cluster crd resource should be created, same as section [Deploy probe-agent]

# create local config yaml file
touch probe-conf.yaml

# input configurations, eg. cluster info
cat << EOF > probe-conf.yaml
probe_master_addr: http://kubeprober-probe-master.kubeprober.svc.cluster.local:8088
cluster_name: moon
secret_key: 2f5079a5-425c-4fb7-8518-562e1685c9b4
EOF

# run probe-agent with config file
APP=probe-agent CONF=./probe-conf.yaml make run

build binary file

APP=probe-master make build
APP=probe-agent make build

build image

# build with default version: latest
# output image format: kubeprober/probe-master:latest
APP=probe-master make docker-build

# build with custom version: v0.0.1
# output image format: kubeprober/probe-master:v0.0.1
APP=probe-master V=v0.0.1 make docker-build

# build with default version: latest
APP=probe-agent make docker-build

# push with default version: latest
APP=probe-agent make docker-push

# build & push
APP=probe-agent make docker-build-push

Write your prober

custom probes

Contributing

Contributions are always welcomed. Please refer to Contributing to KubeProber for details.

Contact Us

If you have any questions, please feel free to contact us.

License

KubeProber is under the Apache 2.0 license. See the LICENSE file for details.

Issues
  • reconciler group

    reconciler group": "kubeprober.erda.cloud", "reconciler kind": "Probe", "name": "probe-test01", "namespace": "kubeprober", "error": "Job.batch \"probe-test01\" is invalid: [spec.template.spec.containers: Required value, spec.template.spec.restartPolicy: Unsupported value: \"Always\": supported values: \"OnFailure\", \"Never\"]"

    probe yaml

    apiVersion: kubeprober.erda.cloud/v1 kind: Probe metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubeprober.erda.cloud/v1","kind":"Probe","metadata":{"annotations":{},"name":"probe-test01","namespace":"kubeprober"},"spec":{"probeList":[{"name":"probe-test01","spec":{"containers":[{"image":"kubeprober/demo-error:v0.0.1","name":"demo-error","resources":{"requests":{"cpu":"10m","memory":"50Mi"}}}],"restartPolicy":"Never"}}]}} creationTimestamp: "2021-08-18T09:24:26Z" generation: 1 name: probe-test01 namespace: kubeprober resourceVersion: "1255475" selfLink: /apis/kubeprober.erda.cloud/v1/namespaces/kubeprober/probes/probe-test01 uid: 1c6a2f23-a68c-4283-9869-0c2ed3e891c7 spec: policy: {} probeList:

    • name: probe-test01 spec: containers:
      • image: kubeprober/demo-error:v0.0.1 name: demo-error resources: requests: cpu: 10m memory: 50Mi restartPolicy: Never status: md5: 0e69f266c6e6d360c7e4130a4b4e6ff4

    probe-agent error image

    kind/bug 
    opened by LeoWenxiang 3
  • add README.md && modify Makefile to unified the build method

    add README.md && modify Makefile to unified the build method

    What type of this PR /kind feature

    What this PR does / why we need it: probe agent: add README.md && modify Makefile to unified the build method

    Specified Reviewers: /assign @luobily @WeiXuSeu

    opened by sixther-dc 2
  • kubectl probe cli support probe master addr configurable

    kubectl probe cli support probe master addr configurable

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    kubectl probe cli support probe master addr configurable\

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @luobily @sixther-dc

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • fix k8s dialer client with same api host bug

    fix k8s dialer client with same api host bug

    What type of this PR

    Add one of the following kinds: /kind bug

    What this PR does / why we need it:

    fix k8s dialer client with same api host bug

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc @luobily

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • use uitable to beautify output of probe status

    use uitable to beautify output of probe status

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    use uitable to beautify output of probe status

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc @luobily

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • fix addon probe bug that kubectl with no namespace

    fix addon probe bug that kubectl with no namespace

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    fix addon probe bug that kubectl with no namespace

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @luobily @sixther-dc

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • adjust probe-agent to compatible older k8s

    adjust probe-agent to compatible older k8s

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    adjust probe-agent to compatible k8s 1.13

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @luobily @WeiXuSeu @sixther-dc

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • modify version of all component

    modify version of all component

    What type of this PR

    Add one of the following kinds: /kind feature

    What this PR does / why we need it:

    modify version of all component

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc @luobily

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • modify README.md

    modify README.md

    What type of this PR

    Add one of the following kinds:

    /kind document

    What this PR does / why we need it:

    modify README.md

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • add kubeprober gif to README

    add kubeprober gif to README

    What type of this PR

    Add one of the following kinds:

    /kind document

    What this PR does / why we need it:

    add kubeprober gif to README

    Which issue(s) this PR fixes:

    • Fixes #your-issue_number
    • [Erda Cloud Issue Link](paste your link here)

    Specified Reviewers:

    /assign @sixther-dc

    Need cherry-pick to release versions?

    opened by sixther-dc 1
  • web hook svc was deleted accidentally while excuting agent's undeploy

    web hook svc was deleted accidentally while excuting agent's undeploy

    What happened:

    web hook svc was deleted accidentally while excuting agent's undeploy

    How to reproduce it (as minimally and precisely as possible):

    install master and agent according to the install doc,then execute APP=probe-agent make undeploy

    Environment:

    • Erda version: master branch code
    • Kubernetes version (use kubectl version): 1.19.11
    • OS (e.g: cat /etc/os-release): mac os
    kind/bug 
    opened by lrul 0
Releases(v0.1.0)
Owner
Erda
Erda Project
Erda
General Pod Autoscaler(GPA) is a extension for K8s HPA, which can be used not only for serving, also for game.

Introduction General Pod Autoscaler(GPA) is a extension for K8s HPA, which can be used not only for serving, also for game. Features Compatible with a

Open Cloud-native Game-application Initiative 11 Oct 12, 2021
Large-scale Kubernetes cluster diagnostic tool.

English | 简体中文 KubeProber What is KubeProber? KubeProber is a diagnostic tool designed for large-scale Kubernetes clusters. It is used to perform diag

Erda 35 Oct 20, 2021
Kubernetes Cluster API Provider AWS

Kubernetes Cluster API Provider AWS Kubernetes-native declarative infrastructure for AWS. What is the Cluster API Provider AWS The Cluster API brings

null 0 Oct 23, 2021
An operator for managing ephemeral clusters in GKE

Test Cluster Operator for GKE This operator provides an API-driven cluster provisioning for integration and performance testing of software that integ

Isovalent 28 Mar 19, 2021
Kubedd – Check migration issues of Kubernetes Objects while K8s upgrade

Kubedd – Check migration issues of Kubernetes Objects while K8s upgrade

Devtron Labs 95 Oct 20, 2021
An operator to support Haschicorp Vault configuration workflows from within Kubernetes

Vault Config Operator This operator helps set up Vault Configurations. The main intent is to do so such that subsequently pods can consume the secrets

Red Hat Communities of Practice 2 Oct 14, 2021
An operator to support Haschicorp Vault configuration workflows from within Kubernetes

Vault Config Operator This operator helps set up Vault Configurations. The main intent is to do so such that subsequently pods can consume the secrets

null 0 Oct 18, 2021
Simple Kubernetes real-time dashboard and management.

Skooner - Kubernetes Dashboard We are changing our name from k8dash to Skooner! Please bear with us as we update our documentation and codebase to ref

null 859 Oct 20, 2021
KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.

kink A helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Table of Contents kink (KinD in Kubernetes) Introduction How it works ?

Trendyol Open Source 298 Oct 16, 2021
Lightweight, CRD based envoy control plane for kubernetes

Lighweight, CRD based Envoy control plane for Kubernetes: Implemented as a Kubernetes Operator Deploy and manage an Envoy xDS server using the Discove

null 36 Oct 20, 2021
Kubernetes Operator for MySQL NDB Cluster.

MySQL NDB Operator The MySQL NDB Operator is a Kubernetes operator for managing a MySQL NDB Cluster setup inside a Kubernetes Cluster. This is in prev

MySQL 8 Oct 16, 2021
Kubernetes IN Docker - local clusters for testing Kubernetes

kind is a tool for running local Kubernetes clusters using Docker container "nodes".

Kubernetes SIGs 8.6k Oct 15, 2021
👀 A Kubernetes cluster resource sanitizer

Popeye - A Kubernetes Cluster Sanitizer Popeye is a utility that scans live Kubernetes cluster and reports potential issues with deployed resources an

Fernand Galiana 3.1k Oct 16, 2021
This process installs onto kubernetes cluster(s) and provisions workloads designated by the uffizzi interface

Uffizzi Cloud Resource Controller This application connects to a Kubernetes (k8s) Cluster to provision Uffizzi users' workloads on their behalf. While

Uffizzi 3 Oct 20, 2021
GitHub中文排行榜,帮助你发现高分优秀中文项目、更高效地吸收国人的优秀经验成果;榜单每周更新一次,敬请关注!

榜单设立目的 ???? GitHub中文排行榜,帮助你发现高分优秀中文项目; 各位开发者伙伴可以更高效地吸收国人的优秀经验、成果; 中文项目只能满足阶段性的需求,想要有进一步提升,还请多花时间学习高分神级英文项目; 榜单设立范围 设立1个总榜(所有语言项目汇总排名)、18个分榜(单个语言项目排名);

kon9chunkit 38.9k Oct 24, 2021
Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes (k8s + wg = kg)

Kilo Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes. Overview Kilo connects nodes in a cluster by providing an e

Lucas Servén Marín 1.1k Oct 15, 2021
Enterprise-grade container platform tailored for multicloud and multi-cluster management

KubeSphere Container Platform What is KubeSphere English | 中文 KubeSphere is a distributed operating system providing cloud native stack with Kubernete

KubeSphere 7.1k Oct 21, 2021
🐶 Kubernetes CLI To Manage Your Clusters In Style!

K9s - Kubernetes CLI To Manage Your Clusters In Style! K9s provides a terminal UI to interact with your Kubernetes clusters. The aim of this project i

Fernand Galiana 13.8k Oct 20, 2021
kubetnl tunnels TCP connections from within a Kubernetes cluster to a cluster-external endpoint, e.g. to your local machine. (the perfect complement to kubectl port-forward)

kubetnl kubetnl (kube tunnel) is a command line utility to tunnel TCP connections from within a Kubernetes to a cluster-external endpoint, e.g. to you

null 3 Sep 7, 2021