Health Check
You can run different types of checks against your Kubernetes cluster to detect any issues or potential problems before they cause any downtime or service disruptions. Check will run in the background and sends data to kubviz. After analysing the data from dashboard you can take corrective action quickly, if any issues are detected.
Configuration
All health checks are enabled by default upon installing the KubViz agent. They are automatically included, but if you don’t need them, you can disable it.
You’ll need to configure it to run health checks on your Kubernetes cluster.
kuberhealthy:
enabled: true
...
Types of Checks
Once you’ve configured it will start running health checks on your Kubernetes cluster. It supports a variety of health checks, The checks are:
Check Name | Description |
---|---|
Daemonset check | Ensures daemonsets can be successfully deployed |
DNS status check | Checks for failures with DNS, including resolving within the cluster and outside of the cluster |
Deployment check | Ensures that a Deployment and Service can be provisioned, created, and serve traffic within the Kubernetes cluster |
Image pull check | Verifies that an image can be pulled from an image repository |
Pod status check | Checks for unhealthy pod statuses in a target namespace |
Pod restart | Checks for excessive pod restarts in any namespace |
Resource quota check | Checks if resource quotas (CPU & memory) are available |
-
Daemonset, Deployment, and DNS checks are enabled by default when you enabling kuberhealthy.
-
Pod Status, Pod Restart, Image Pull, and Resource Quota checks need to be manually enabled.
Daemonset Check
-
Purpose of Daemonset Check: Validates the stable deployment and operation of daemonsets across all Kubernetes nodes, ensuring critical services are uniformly available.
-
It automatically deploys a test daemonset, verifies pod scheduling on each node, and checks for successful pod termination upon completion. The check runs every 60 minutes.
Deployment Check
-
Purpose of Deployment Check: Assesses the success of application deployments in the Kubernetes cluster, ensuring configurations and services are correctly launched.
-
Initiates a test deployment, evaluates the deployment process, service accessibility, and rollbacks if necessary, to ensure operational integrity.
DNS Check
-
Purpose of DNS Check: Ensures that DNS resolution is working correctly within the Kubernetes cluster, critical for service discovery and network communication.
-
Performs DNS lookups to validate the responsiveness and accuracy of the cluster’s DNS service, identifying potential issues early.
Image Pull Check
-
Image pull check is a custom check that requires manual enabling.
-
This container tests the availability of image respositories.
-
This check will run every 60 minutes. You can change this by modifying the
runInterval
.
imagePullCheck:
enabled: true
runInterval: 60m
timeout: 1m
image:
repository: kuberhealthy/test-check
tag: v1.4.0
extraEnvs:
REPORT_FAILURE: "false"
REPORT_DELAY: "1s"
resources:
requests:
cpu: 10m
memory: 50Mi
...
Steps to Follow Before Running the Image Pull Check
- Pull the test image from docker hub
docker pull kuberhealthy/test-check
- Push this image on the repository you need tested.
docker push my.repository/repo/test-check
- Replace the
repository
value with your repository.
- The pod is designed to attempt a pull of the test image from the remote repository (never from local). If the image is unavailable, an error will be reported to the API
Pod Status Check
-
Pod status check is a custom check that requires manual enabling.
-
Purpose of Pod Status Check: Monitors the health and status of pods within the Kubernetes cluster to ensure they are running and stable.
-
This check will run every 5 minutes. You can change this by modifying the
runInterval
.
podStatus:
enabled: true
runInterval: 5m
timeout: 15m
image:
registry: docker.io
repository: kuberhealthy/pod-status-check
tag: v1.3.0
allNamespaces: true
extraEnvs: {}
nodeSelector: {}
tolerations: []
resources:
requests:
cpu: 10m
memory: 50Mi
...
Pod Restart Check
-
Pod restart check is a custom check that requires manual enabling.
-
The Pod Restarts Check checks for excessive pod restarts in a given
POD_NAMESPACE
. -
The Pod Restarts Check deploys a pod that looks for pod resource events in a given
POD_NAMESPACE
and checks for Warning event types with reasonBackOff
. If this specific event type count exceeds theMAX_FAILURES_ALLOWED
, an error is reporting back. -
The check runs every 5m (spec.runInterval) with a check timeout set to 10 minutes (spec.timeout), and a
MAX_FAILURES_ALLOWED
count set to 10. If the check does not complete within the given timeout it will report a timeout error.
podRestarts:
enabled: true
runInterval: 5m
timeout: 10m
image:
registry: docker.io
repository: kuberhealthy/pod-restarts-check
tag: v2.5.0
allNamespaces: true
extraEnvs:
MAX_FAILURES_ALLOWED: "10"
nodeSelector: {}
tolerations: []
resources:
requests:
cpu: 10m
memory: 50Mi
...
Resource Quota Check
- This check tests if namespace resource quotas CPU and memory are under a specified threshold or percentage. It requires manual enabling.
resourceQuota:
enabled: true
runInterval: 1h
timeout: 2m
image:
repository: kuberhealthy/resource-quota-check
tag: v1.3.0
extraEnvs:
BLACKLIST: "default"
WHITELIST: "kube-system,kubviz"
resources:
requests:
cpu: 15m
memory: 15Mi
limits:
cpu: 30m
...
- Configurable check environment variables:
BLACKLIST
: Blacklist of namespaces to look at (default for BLACKLIST=default)WHITELIST
: Whitelist of namespaces to look at. (default for whitelist=kube-system,kubviz)