Kubernetes Horizontal Pod Autoscaling
Introduction
There are two common scaling methods: Vertical scaling and Horizontal scaling.
Vertical scaling involves adding more hardware, such as RAM or CPU, or increasing the number of server nodes. Horizontal scaling, on the other hand, means adding more instances of an app to fully utilize the available resources on a node or server.
However, horizontal scaling has its limits. Once a node's resources are maxed out, vertical scaling becomes necessary. This article will focus on horizontal scaling using Kubernetes Horizontal Pod Autoscaling (HPA), which automatically scales resources up or down based on system demands.
Implementation Process
1. Build a Docker image for your application.
2. Deploy the image using a Deployment and LoadBalancer service.
3. Configure HPA to automatically scale resources.
To use HPA for auto-scaling based on CPU/Memory, Kubernetes must have the metrics-server installed. If you’re using a cloud provider, the metrics-server is usually installed by default. For local Kubernetes setups, you need to manually install the metrics-server.
If you’re using Kind for a local Kubernetes setup, follow these steps to install the metrics-server after successfully creating the cluster:
1. Build a Docker Image for the Application
Use the following code block to create a NodeJS Express Server:
Next, let's build the Docker image and push it to Google Artifact Registry or Docker Hub. You can refer to my guide on how to do this here.
2. Deploy the image using a Deployment and a LoadBalancer service
Create a `deployment.yml` file that includes the configuration for the Deployment to deploy the image you built, along with a LoadBalancer service, as shown below:
I've explained the details about deployment and the LoadBalancer service in this article.
Here, we also cover resource configuration. You can choose to configure CPU, Memory, or both, depending on which parameters you want to scale.
- If you define resource values, you can scale by a percentage of the initially defined resources.
- If you don't define resource values, you must specify the exact resource values to scale.
3. Configuring HPA for Auto-scaling Resources
You can include the HPA configuration either in your `deployment.yml` file or in a separate file with the following content:
- minReplicas, maxReplicas: min and max scaling resource
- metrics: Define the type of resource you want to scale; in this case, it's the CPU.
- averageUtilization: This is a percentage. When it exceeds this value, the system will scale up.
- behavior: This is optional. Here, it's used to define the scale-down behavior, allowing a maximum of 3 Pods to scale down in 30 seconds.
- stabilizationWindowSeconds: When the system remains stable for this duration, it will scale down (the default value is 5 minutes).
Next, apply to create the resource as follows:
Note: I've defined all resources in a single file for simplicity, but in practice, you should separate each resource into individual YAML files for better management.
Information on the resource once created:
Testing HPA
You can test the API using any method you know. Here, I provide a code block to send 10 requests every second. Replace the URL with the EXTERNAL-IP of the LoadBalancer service.
After executing, the server resources will gradually increase, triggering the auto-scaling process.
You can check if HPA has performed the auto-scaling as follows:
You will notice that the number of Replicas will gradually increase when the CPU usage exceeds the 80% target (field value averageUtilization) and will gradually decrease after a period of time (field value stabilizationWindowSeconds) when the system stabilizes.
See you in the next articles!
Comments
Post a Comment