Posts

Showing posts with the label work_mem

Hash Join

Image
Introduction In this article, we will learn about Hash Join, which is the preferred algorithm when joining two large tables without indexes, or when the data volume exceeds the optimization capability of a Nested Loop. How It Works Step 1: Build Phase First, PostgreSQL selects the smaller table to load into memory (RAM). Next, it takes each item and passes it into a Hash Function to receive a hash number. Postgres relies on this value to sort the results into the corresponding bucket, resulting in the creation of a Hash Table stored in RAM. Step 2: Probe Phase Now, PostgreSQL scans the larger table, also taking each item to pass into the Hash Function just like in step 1. Once the hash number is obtained, it only needs to access the corresponding bucket in the Hash Table to check. Hash Collision When using a Hash Function, there are still cases where two different values produce the same result, which is called a Hash Collision. In this case, values with the same Hash Number will resid...

Overhead in PostgreSQL

Image
Introduction In computer science and systems engineering, overhead is not a bug, but it is a phenomenon of system resource waste. Overhead is the amount of time, memory, bandwidth or CPU power consumed to manage or operate the system, rather than contributing directly to the actual data processing results. In database management systems, overhead occurs very frequently. Especially, if the system suffers from excessive memory (RAM) overhead, the Operating System (OS) will trigger a mechanism to kill that process to save the server. Common Types of Overhead in Postgres Postgres has a process-based architecture, meaning each client connection generates an independent process, so it easily encounters the following types of overhead: Connection Overhead: If you have 500 concurrent connections, Postgres will create 500 processes. The fact that the CPU must constantly context switch among these 500 processes creates a massive amount of overhead, significantly reducing performance. Memory Over...