Overhead in PostgreSQL
Introduction
- In computer science and systems engineering, overhead is not a bug, but it is a phenomenon of system resource waste.
- Overhead is the amount of time, memory, bandwidth or CPU power consumed to manage or operate the system, rather than contributing directly to the actual data processing results.
- In database management systems, overhead occurs very frequently. Especially, if the system suffers from excessive memory (RAM) overhead, the Operating System (OS) will trigger a mechanism to kill that process to save the server.
Common Types of Overhead in Postgres
Postgres has a process-based architecture, meaning each client connection generates an independent process, so it easily encounters the following types of overhead:
- Connection Overhead: If you have 500 concurrent connections, Postgres will create 500 processes. The fact that the CPU must constantly context switch among these 500 processes creates a massive amount of overhead, significantly reducing performance.
- Memory Overhead: In Postgres, there is a parameter called
work_memwhich is the memory allocated for each operation like Sort or Hash within a query. If you configure this value inappropriately for your existing system configuration, such as settingwork_mem = 64MB, then when 100 queries run concurrently, and each query performs 2-3 operations, the actual RAM consumed will be 64MB x 100 x 3 = 18.75GB. - Index Overhead: Creating multiple indexes helps speed up searches (READ), but causes overhead during data writes (WRITE), because for every
INSERT/UPDATE/DELETE, Postgres must update all relevant indexes. - Vacuum Overhead: Postgres uses the MVCC mechanism, which retains old versions of data rows when modifications occur. The
Autovacuumprocess must run in the background to clean up garbage. If there are too many database operations, Autovacuum will consume a lot of CPU and hard disk I/O resources.
OOM Killer
- When Postgres or other applications exhaust memory (RAM) and virtual memory (Swap), operating systems like Linux will trigger a mechanism called OOM Killer (Out of Memory Killer) to save the system from freezing completely.
- Linux will examine non-core system processes that consume a lot of RAM and have the longest running time to assign an OOM Score. Linux will send SIGKILL to the processes with the highest OOM Score to terminate them completely.
- Although Postgres utilizes a multi-process architecture, these processes all share a common memory area called Shared Buffer, so when a child process is suddenly killed, it does not have enough time to clean up or release resources.
- When the Postmaster (parent process) realizes a child process has terminated abruptly, it does not know how much the Shared Buffer has been altered, so to ensure data integrity, the Postmaster will automatically shut down the remaining child processes and terminate itself.
- As a result, other applications connecting to Postgres at that time will receive a connection loss error.
Solutions
- Batching/Chunking: Because the root cause of overhead is requests consuming too much RAM and CPU, the solution when dealing with a large volume of data is to break it down into smaller, manageable chunks to process a moderate amount of data at a time.
- Caching: Utilize caching tools like Redis to store data, preventing unnecessary requests to the database for frequently accessed information.
- Indexing: Apply appropriate indexes for each corresponding data type and dataset to avoid full heap scans across the entire data structure.
- Adjust
shared_buffersappropriately: This is the amount of memory used to cache data (Heap and Index) read from disk, which should only be set at 25% to a maximum of 40% of the total RAM capacity of the machine. - Adjust
work_memappropriately: This is the maximum amount of memory allocated for each sorting operation (ORDER BY, DISTINCT) or data JOIN (such as Hash Join and Merge Join).- If set too small, Postgres is forced to write temporary data to disk, which will slow down the query significantly.
- If set too large, when too many query connections arrive simultaneously and the system lacks sufficient memory, it will cause overhead.
- The popular safe level is from
32MBto64MB. - For exceptionally large tasks (running reports, data migration), you can increase it specifically for that session and run the query, and when the session closes, the value will automatically reset.
- Use a Connection Pooler: Install an additional connection pool like PgBouncer in front of Postgres to limit the actual number of connections to the database, reducing connection overhead.
- Note that if your system has only 1 server instance and you use database connection drivers like
pg, connection pooling is already supported by default. - But if the system supports scaling up to multiple Pods and they all connect to 1 database, you need to integrate an external Connection Pool to accept connections from all Pods before reaching the actual database.
- Note that if your system has only 1 server instance and you use database connection drivers like
Detail
First, to view the values of shared_buffers and work_mem, you can use the following queries:
SHOW shared_buffers;
SHOW work_mem;
SELECT name, setting, unit, current_setting(name)
FROM pg_settings
WHERE name IN ('shared_buffers', 'work_mem');name: The name of the configuration variable in PostgreSQL.setting: The configuration value stored as a raw integer that Postgres uses for internal calculations. This value is always accompanied by the unit in the unit column.unit: The unit of measurement for the number in the setting column.current_setting: The actual value currently being applied, which has been automatically translated by Postgres into a human-readable format (such as MB, kB). You only need to look at this column to know the actual capacity.shared_buffers- current_setting = 128MB: The shared buffer cache of Postgres is currently only 128 Megabytes.
- Postgres manages memory by data blocks called "Pages", where each page defaults to
8kB(unit column). The setting column displays16384, meaning Postgres is allocating 16384 x 8 kB = 131,072 kB = 128 MB.
work_mem- current_setting = 4MB: Each ORDER or JOIN operation in a query is only allocated a maximum of 4 Megabytes of RAM.
- The unit column here is
kB, taking the setting of4096multiplied by1kBwill yield4096kB(4MB).
Next, to change the value, you need to check the path of the current configuration file as follows:
SHOW config_file;
After that, simply update the postgresql.conf file and change the values accordingly:
shared_buffers = 256MB
work_mem = 8MB
Then you need to restart PostgreSQL for the changes to take effect.
Check again as follows to ensure that Postgres is using the new configuration.
You can also use the following method to change work_mem for a single session:
SET work_mem = '250MB';
select * from test;
Note that a session means from the moment the connection is established until it disconnects or when you switch to querying on another tab.
If you want to reset to the default value, do the following:
RESET work_mem;
You can also use SET LOCAL to only apply the new work_mem within a single transaction:
BEGIN;
SET LOCAL work_mem = '256MB';
SELECT * FROM test;
COMMIT;
Happy coding!
Comments
Post a Comment