Kubernetes Horizontal Pod Autoscaling

Introduction

There are two common scaling methods: Vertical scaling and Horizontal scaling.

Vertical scaling involves adding more hardware, such as RAM or CPU, or increasing the number of server nodes. Horizontal scaling, on the other hand, means adding more instances of an app to fully utilize the available resources on a node or server.

However, horizontal scaling has its limits. Once a node's resources are maxed out, vertical scaling becomes necessary. This article will focus on horizontal scaling using Kubernetes Horizontal Pod Autoscaling (HPA), which automatically scales resources up or down based on system demands.

Implementation Process

1. Build a Docker image for your application.

2. Deploy the image using a Deployment and LoadBalancer service.

3. Configure HPA to automatically scale resources.


To use HPA for auto-scaling based on CPU/Memory, Kubernetes must have the metrics-server installed. If you’re using a cloud provider, the metrics-server is usually installed by default. For local Kubernetes setups, you need to manually install the metrics-server

If you’re using Kind for a local Kubernetes setup, follow these steps to install the metrics-server after successfully creating the cluster:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Or install with Helm

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server`
helm upgrade --install metrics-server metrics-server/metrics-server --namespace kube-system


To check metrics-server already installed

kubectl get pods -n kube-system


1. Build a Docker Image for the Application

Use the following code block to create a NodeJS Express Server:

import express from 'express'

const port = 3000
const app = express()

app
.get('/', (_, res) => {
res.send('This is NodeJS Typescript Application! Current time is ' + Date.now())
})
.get('/sum', (req, res) => {
const value = +req?.query?.value
const start = Date.now()
const result = Array(+value)
.fill(0)
.map((_, i) => i)
.reduce((a, b) => a + b)
const now = Date.now()
const duration = now - start
res.json({duration, now, result})
})
.listen(port, () => {
console.log(`Server is running http://localhost:${port}`)
})


Next, let's build the Docker image and push it to Google Artifact Registry or Docker Hub. You can refer to my guide on how to do this here.


2. Deploy the image using a Deployment and a LoadBalancer service

Create a `deployment.yml` file that includes the configuration for the Deployment to deploy the image you built, along with a LoadBalancer service, as shown below:

apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-name
labels:
name: label-name
spec:
selector:
matchLabels:
app: label-name
template:
metadata:
labels:
app: label-name
spec:
restartPolicy: Always
containers:
- name: express-ts
image: express-ts
resources:
requests: # min
memory: "100Mi"
cpu: "100m"
limits: # max
memory: "300Mi"
cpu: "300m"
---
apiVersion: v1
kind: Service
metadata:
name: service-name
labels:
service: label-name
spec:
selector:
app: label-name
type: LoadBalancer
ports:
- protocol: TCP
port: 80 # port service
targetPort: 3000 # port pod

I've explained the details about deployment and the LoadBalancer service in this article.

Here, we also cover resource configuration. You can choose to configure CPU, Memory, or both, depending on which parameters you want to scale.

  • If you define resource values, you can scale by a percentage of the initially defined resources.
  • If you don't define resource values, you must specify the exact resource values to scale.


3. Configuring HPA for Auto-scaling Resources

You can include the HPA configuration either in your `deployment.yml` file or in a separate file with the following content:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-name
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment-name # target to deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu # scale base on CPU
target:
type: Utilization
averageUtilization: 80 # target 80%
behavior:
scaleDown:
policies:
- type: Pods
periodSeconds: 30
value: 3
stabilizationWindowSeconds: 120

  • minReplicas, maxReplicas: min and max scaling resource
  • metrics: Define the type of resource you want to scale; in this case, it's the CPU.
  • averageUtilization: This is a percentage. When it exceeds this value, the system will scale up.
  • behavior: This is optional. Here, it's used to define the scale-down behavior, allowing a maximum of 3 Pods to scale down in 30 seconds.
  • stabilizationWindowSeconds: When the system remains stable for this duration, it will scale down (the default value is 5 minutes).


Next, apply to create the resource as follows:

kubectl apply -f deployment.yml

Note: I've defined all resources in a single file for simplicity, but in practice, you should separate each resource into individual YAML files for better management.


Information on the resource once created:


Please ensure that this API is working so we can continue testing the HPA.

Testing HPA

You can test the API using any method you know. Here, I provide a code block to send 10 requests every second. Replace the URL with the EXTERNAL-IP of the LoadBalancer service.

const numOfRequest = 10
const url = 'http://172.23.0.3/sum?value=10000000'
let idx = 0

setInterval(() => {
Promise.all(
Array(numOfRequest)
.fill(0)
.map(() =>
fetch(url)
.then(res => res.json())
.then(data => console.log('Completed', ++idx, data.duration))
.catch(console.error)
)
)
}, 1000)

After executing, the server resources will gradually increase, triggering the auto-scaling process.


You can check if HPA has performed the auto-scaling as follows:


You will notice that the number of Replicas will gradually increase when the CPU usage exceeds the 80% target (field value averageUtilization) and will gradually decrease after a period of time (field value stabilizationWindowSeconds) when the system stabilizes.

See you in the next articles!

Comments

Popular posts from this blog

Kubernetes Practice Series

NodeJS Practice Series

Docker Practice Series

React Practice Series

Sitemap

Setting up Kubernetes Dashboard with Kind

Explaining Async/Await in JavaScript in 10 Minutes

Create API Gateway with fast-gateway

Deploying a NodeJS Server on Google Kubernetes Engine

What is react-query? Why should we use react-query?