Health Probes Explained: Day 18 of 40daysofkubernetes

Introduction

In the dynamic and ever-evolving world of container orchestration, ensuring that applications run smoothly and reliably is paramount. Kubernetes, as a leading container orchestration platform, provides a robust mechanism to monitor and manage the health of applications through the use of health probes. These probes play a critical role in maintaining the stability and availability of services by continuously checking the health and readiness of application components.

In this blog, we will delve into the necessity of health probes in Kubernetes, explore the different types of probes available, and understand how they contribute to the seamless operation of containerized applications. Let's get started.

Health Probes

Health probes are mechanisms used to determine the health and readiness of containers running in a Pod. These probes help ensure that the applications running inside the containers are functioning correctly and are ready to accept traffic.

Types of Health Probes

Liveness Probe
- Purpose: Determines if a container is running. If the liveness probe fails, Kubernetes will kill the container and restart it according to the Pod's restart policy.
- Usage: Helps to ensure that applications are still running and can recover from failures.
Readiness Probe
- Purpose: Determines if a container is ready to start accepting traffic. If the readiness probe fails, the endpoints controller will remove the Pod's IP address from the endpoints of all services that match the Pod. This ensures that traffic is not sent to a Pod that is not ready.
- Usage: Ensures that only healthy Pods receive traffic, preventing downtime and errors.
Startup Probe
- Purpose: Determines if a container has started up. If the startup probe fails, Kubernetes will kill the container and restart it according to the Pod's restart policy. This probe is useful for applications that have a longer startup time.
- Usage: Useful for complex applications that take a long time to start and might fail initial readiness or liveness checks.

Real-Life Example

Let's use a real-life example to explain Kubernetes probes. Imagine you have a web application called mywebapp running in a Kubernetes cluster. This web application has three main characteristics:

It takes some time to start.
It needs to be ready to accept user requests.
It should be restarted if it stops responding.

Here's how each type of probe can be used to ensure the application runs smoothly:

Scenario

You have a mywebapp container, and you want to:

Ensure it starts correctly before it starts processing requests.
Ensure it is always ready to handle requests.
Restart it if it stops working properly.

Liveness Probe

Imagine mywebapp sometimes crashes or gets stuck. You want Kubernetes to restart it automatically in such cases.

Liveness Probe: Think of this as a periodic check-up to see if the application is still alive. It might check an endpoint like /healthz that returns 200 OK if the application is running well. If this check fails, Kubernetes will restart the container.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20

Real-Life Example: Imagine you have a monitoring system that checks if a web server is running every 20 minutes. If the server doesn't respond, you restart it. Similarly, the liveness probe checks if the app is alive and restarts it if it's not.

Readiness Probe

Once mywebapp is running, you want to ensure it’s ready to handle user requests. For example, it might need to load some data or establish a database connection first.

Readiness Probe: This probe checks if the application is ready to serve traffic. It might check an endpoint like /ready that returns 200 OK only when the app is fully ready. If this check fails, the app won’t receive any traffic until it passes.

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

Real-Life Example: Think of a restaurant that opens at 10 AM. Even if the restaurant is technically open, it may not be ready to serve customers until the staff has prepared everything. Similarly, the readiness probe checks if the app is ready to serve requests.

Startup Probe

Suppose mywebapp has a lengthy initialization process, like loading large datasets or performing some setup tasks. You want to give it enough time to start without failing initial checks.

Startup Probe: This probe checks if the application has started up correctly. It might check an endpoint like /startup that returns 200 OK once the app has fully started. This probe is useful for applications that take a long time to start and might fail initial liveness or readiness checks.

startupProbe:
  httpGet:
    path: /startup
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

Real-Life Example: Imagine a car engine that needs some time to warm up before it can start running smoothly. The startup probe gives the app the time it needs to warm up and start correctly.

Configuration of Probes

Probes can be configured using the following methods:

HTTP Probes: Send an HTTP request to the container. If the response is within the configured success range, the container is considered healthy.

  livenessProbe:
    httpGet:
      path: /healthz
      port: 8080
    initialDelaySeconds: 3
    periodSeconds: 3

TCP Probes: Attempt to open a TCP connection to the specified port. If the connection is successful, the container is considered healthy.
```
  livenessProbe:
    tcpSocket:
      port: 8080
    initialDelaySeconds: 3
    periodSeconds: 3
```

Command Probes: Execute a command inside the container. If the command returns a zero exit status, the container is considered healthy.

  livenessProbe:
    exec:
      command:
      - cat
      - /tmp/healthy
    initialDelaySeconds: 3
    periodSeconds: 3

Hands-on Liveness Probe

liveness command-probe

 apiVersion: v1
 kind: Pod
 metadata:
   labels:
     test: liveness
   name: liveness-exec
 spec:
   containers:
   - name: liveness
     image: registry.k8s.io/busybox
     args:
     - /bin/sh
     - -c
     - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
     livenessProbe:
       exec:
         command:
         - cat 
         - /tmp/healthy
       initialDelaySeconds: 5
       periodSeconds: 5

Step-by-Step Explanation

Pod Creation: When you apply this YAML file, Kubernetes creates a Pod named liveness-exec with a single container using the busybox image.
Container Initialization:
- The container starts and executes the command specified in args:
```
  touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
```
- This command sequence does the following:
  1. Creates a file named /tmp/healthy.
  2. Sleeps (pauses) for 30 seconds.
  3. Deletes the /tmp/healthy file.
  4. Sleeps for 600 seconds (10 minutes).
Liveness Probe Initialization:
- The livenessProbe is configured to check the container's health using an exec command:
```
  livenessProbe:
    exec:
      command:
      - cat 
      - /tmp/healthy
    initialDelaySeconds: 5
    periodSeconds: 5
```
- initialDelaySeconds: 5: This means Kubernetes waits for 5 seconds after the container starts before performing the first liveness check.
- periodSeconds: 5: This means Kubernetes will perform the liveness check every 5 seconds.

Sequence of Events

Container Start:
- The container starts and executes the initial command: touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600.
Initial Delay (5 seconds):
- Kubernetes waits for 5 seconds before performing the first liveness probe.
First Liveness Probe (after 5 seconds):
- Kubernetes runs the liveness probe command: cat /tmp/healthy.
- Since the /tmp/healthy file exists, the command succeeds, and the container is considered healthy.
Subsequent Liveness Probes (every 5 seconds):
- Kubernetes continues to run the liveness probe command every 5 seconds.
- For the first 30 seconds, the /tmp/healthy file exists, so the container is considered healthy.
After 30 seconds:
- The container's command sequence deletes the /tmp/healthy file: rm -f /tmp/healthy.
- The container then sleeps for 600 seconds.
Liveness Probe Fails:
- Once the /tmp/healthy file is deleted, the next liveness probe (which runs every 5 seconds) will fail because the cat /tmp/healthy command will return an error (file not found).
Container Restart:
- Because the liveness probe fails, Kubernetes will consider the container to be unhealthy and will restart it.
- The container restarts, and the initial command sequence starts again, creating the /tmp/healthy file, and the process repeats.

Summary

Initial Delay: Kubernetes waits for 5 seconds before starting liveness checks.
Periodic Checks: Kubernetes checks the container's health every 5 seconds.
Health Check: The container is considered healthy as long as the /tmp/healthy file exists.
Failure and Restart: After 30 seconds, the file is deleted, causing the liveness probe to fail, and Kubernetes restarts the container.

In Image we can see that our pod is restarts and when u describe the pod using kubectl describe pod/liveness-exec u see that

liveness http-probe

 apiVersion: v1
 kind: Pod
 metadata:
   name: hello
 spec:
   containers:
   - name: liveness
     image: registry.k8s.io/e2e-test-images/agnhost:2.40
     args:
     - liveness
     livenessProbe:
       httpGet:
         path: /healthz
         port: 8080
       initialDelaySeconds: 3
       periodSeconds: 3

In above yaml file the liveness probe is making httpGet request to /healthz which is not exist in the container so pod will restarts after every 3 sec.

liveness tcp-probe
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: tcp-pod
   labels:
     app: tcp-pod
 spec:
   containers:
   - name: goproxy
     image: registry.k8s.io/goproxy:0.1
     ports:
     - containerPort: 8080
     livenessProbe:
       tcpSocket:
         port: 3000
       initialDelaySeconds: 10
       periodSeconds: 5
```
- Initial Delay: Kubernetes waits for 10 seconds before starting the liveness checks to allow the container to initialize.
- Periodic Checks: Kubernetes performs a health check every 5 seconds.
- Health Check: The liveness probe checks if a TCP connection to port 3000 can be established. If successful, the container is healthy.
- Failure and Restart: If the liveness probe fails (TCP connection cannot be established), Kubernetes will restart the container to ensure it remains healthy and functional.

the TCP connection to port 3000 fails (e.g., the application inside the container is not responding or the port is not open), the liveness probe fails,Kubernetes considers the container unhealthy and will restart the container to attempt to recover it.

the other two probes (readiness and startup) have also same syntax, Please do some hands-on on readiness probe from your side for better understanding, u can take help from here .

Resources I used:

https://www.youtube.com/watch?v=x2e6pIBLKzw&list=PLl4APkPHzsUUOkOv3i62UidrLmSB8DcGC&index=19&t=3s