Kubernetes Health Check and Auto Restart

Introduction

When you deploy an application to a production environment, various issues can cause it to stop working. These could be code bugs, database problems, or external service issues. Each problem requires a different solution. However, if you’re using Kubernetes to deploy your application and want it to automatically restart when an issue occurs, this article is for you.

Prerequisites

Before proceeding, ensure you have:

A Kubernetes cluster set up. You can use Google Kubernetes Engine or set up a local Kubernetes cluster with Kind.
Knowledge of Kubernetes, specifically how to create Deployments and Services.

Kubernetes Probes

In this article, I'll guide you through using three types of probes to check the status of your application:

1. Startup Probe

- As the name suggests, this probe runs when the application starts. It ensures the container has started successfully. Only after the Startup Probe succeeds do the Readiness and Liveness Probes execute.

2. Readiness Probe

- This probe is similar to the Startup Probe. While the Startup Probe ensures the container has started, it doesn’t mean the application is ready to use. The application might need a successful database connection or other services ready. The Readiness Probe checks these dependencies.

- It runs throughout the container’s lifecycle. Pods that don’t meet the conditions set by the Readiness Probe are removed from the service endpoint and won’t receive traffic. This probe helps direct traffic to ready Pods.

3. Liveness Probe

- This probe ensures the application is always running. It runs throughout the container’s lifecycle, and if the probe fails, the container automatically restarts.

Example Usage

Below is a code block to set up a NodeJS server with the necessary APIs:

import express from 'express'

const port = 3000
const app = express()
let isReadyHealthZ = false
let isReadyReadiness = false

setTimeout(() => (isReadyHealthZ = true), 10000)
setTimeout(() => (isReadyReadiness = true), 15000)

app
  .get('/', (_, res) => {
    res.send('This is NodeJS Typescript Application! Current time is ' + Date.now())
  })
  .get('/healthz', (_, res) => {
    isReadyHealthZ ? res.json({ok: true}) : res.status(500).json()
  })
  .get('/readiness', (_, res) => {
    isReadyReadiness ? res.json({ok: true}) : res.status(400).json()
  })
  .get('/liveness', (_, res) => {
    res.json({ok: true})
  })
  .get('/crash', () => {
    process.exit()
  })
  .listen(port, () => {
    console.log(`Server is running http://localhost:${port}`)
  })

/healthz: Used for the Startup probe, executed only when the container starts. After a successful run, it moves on to execute the next probes.
/readiness: Used for the Readiness probe, executed throughout the application's lifecycle. If a Pod fails, it will be deleted, and traffic will be redirected to ready Pods.
/liveness: Used for the Liveness probe, executed throughout the application's lifecycle. If this test fails, the application will be restarted.
/crash: When this API is called, the app will crash. This is used to test auto-restart.

As for the setTimeout part, it's used to simulate a delay when the app starts. In a real-world scenario, you wouldn't need to use it.

Next, create a file named `deployment.yml` with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-name
  labels:
    name: label-name
spec:
  selector:
    matchLabels:
      app: label-name
  template:
    metadata:
      labels:
        app: label-name
    spec:
      restartPolicy: Always
      containers:
        - name: express-ts
          image: express-ts # update with your image name
          imagePullPolicy: Always
          ports:
            - containerPort: 3000
              name: deployment-port
          startupProbe:
            httpGet:
              path: /healthz
              port: deployment-port
            failureThreshold: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /readiness
              port: deployment-port
            failureThreshold: 1
            periodSeconds: 10
            initialDelaySeconds: 5
            successThreshold: 3
          livenessProbe:
            httpGet:
              path: /liveness
              port: deployment-port
            failureThreshold: 1
            periodSeconds: 10
            initialDelaySeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: service-name
  labels:
    service: label-name
spec:
  selector:
    app: label-name
  type: LoadBalancer
  ports:
    - protocol: TCP
      port: 80 # port service
      targetPort: deployment-port # port pod

I have detailed the steps to create a Deployment and LoadBalancer service. You can refer back to it if you need more information.

You'll notice that the configurations for `startupProbe`, `readinessProbe`, and `livenessProbe` are quite similar. Each requires defining `httpGet` which includes the `path` (the API endpoint) and the `port` (the service port).

failureThreshold: This is the number of allowed failures. It means if an API call fails, it will retry after the `periodSeconds` interval. However, if the number of failed attempts exceeds the `failureThreshold`, the container will restart.
periodSeconds: This is the interval between each execution.
initialDelaySeconds: This is the time delay before starting execution after the container starts. It is used only in `readinessProbe` and `livenessProbe`.
successThreshold: This is the number of successful tests required. The API must pass this number of times before the Pod is considered ready to use.

After setting up the configurations, apply them to create the resource.

kubectl apply -f deployment.yml

The result look like:

Use the EXTERNAL-IP to access the application. Make sure everything is running smoothly by calling the /healthz, /readiness, and /liveness APIs. Then, call the /crash API to intentionally crash the app and check if the Pod restarts correctly.