Posts

Showing posts with the label sql

PostgreSQL Practice Series

Image
Introduction PostgreSQL is a powerful open-source relational database management system, developed with many powerful features, high performance and famous for its reliability applied across various diverse projects. The outstanding advantages include: High data integrity and full ACID compliance. Powerful extensibility, allowing users to define their own data types, functions and custom indexes. Support for a wide variety of data types from structured (SQL) to unstructured (JSON/JSONB, XML). A strong development community, continuously updated and optimally secure. Detail Installing PostgreSQL with Docker Explanation of PostgreSQL Operations TOAST Storage Strategies Design schema Normalization and Denormalization Data Integrity Index B-Tree Index Using Hash Index GIN Index GIN Index with Array GIN Index with TEXT GIN Index with JsonB GiST Index SP-GiST Index BRIN Index Composite Index Expression Index Partial Index Join Nested Loop Join Hash Join Merge Join Scaling Overhead in Postgre...

Merge Join

Image
Introduction In this article, we will learn about Merge Join. How It Works Simply put, how Merge Join works is like sorting two lists in ascending order first, then placing them side by side for comparison. Assuming we need to compare the id column of two tables, A and B, the specific process is as follows: Step 1: Sort Phase PostgreSQL must ensure that both lists are already sorted by the order of the join column (for example, from smallest to largest). If the two tables are already sorted, such as when the join column is a Primary Key or has an Index, Postgres will skip this step and jump to step 2. Step 2: Merge Phase PostgreSQL will place two pointers starting from the first row of the two tables and begin comparing: If the results on both sides are equal: Return the result, then move the pointer in table B to the next row to continue checking (since data in a column can be duplicated). If the id in table A is smaller: Move the table A pointer to the next row. If the id in table A ...

Hash Join

Image
Introduction In this article, we will learn about Hash Join, which is the preferred algorithm when joining two large tables without indexes, or when the data volume exceeds the optimization capability of a Nested Loop. How It Works Step 1: Build Phase First, PostgreSQL selects the smaller table to load into memory (RAM). Next, it takes each item and passes it into a Hash Function to receive a hash number. Postgres relies on this value to sort the results into the corresponding bucket, resulting in the creation of a Hash Table stored in RAM. Step 2: Probe Phase Now, PostgreSQL scans the larger table, also taking each item to pass into the Hash Function just like in step 1. Once the hash number is obtained, it only needs to access the corresponding bucket in the Hash Table to check. Hash Collision When using a Hash Function, there are still cases where two different values produce the same result, which is called a Hash Collision. In this case, values with the same Hash Number will resid...

Normalization and Denormalization

Image
Introduction Data Anomalies Before diving into the main content of this article, we need to understand Data Anomalies, which are logical flaws or data discrepancies that occur in relational databases when proper table normalization has not been applied. This leads to information being stored repeatedly in a redundant manner. When data operations (Insert, Update, Delete) are performed, they cause the database to fall into an inconsistent state where data is correct in some places but incorrect in others. Data Anomalies are divided into 3 main types: Insertion Anomaly: Occurs when you cannot add a new record into the database because the system forces you to enter another piece of information that does not yet exist. Deletion Anomaly: Occurs when you delete one piece of information but accidentally lose another completely different and important piece of information. Update Anomaly: Occurs when a piece of information is repeated across too many rows. As a result, when you edit that infor...

BRIN Index

Image
Introduction BRIN (Block Range Index) works by grouping a cluster of pages in the HEAP together into a single range rather than creating an index for every single data row like a B-Tree. The structure of a data row inside a BRIN Index (called an Index Tuple) consists of the following three core components: blknum (Block Number): The block number starting the range (for example: 0, 128, 256, 512...). min_value: The minimum value of that column within a block range. max_value: The maximum value of that column within a block range. How It Works When executing a query with a BRIN Index, Postgres scans the index and checks whether the value being searched for falls within any [Min, Max] range If not, it skips that entire block range, eliminating a huge amount of redundant data. If yes, Postgres relies on the blknum to load the corresponding Pages in the HEAP up and checks them sequentially to find the exact data row. Advantages Super compact: Because it only stores the Min and Max for an en...