Posts

Showing posts with the label indexing

GiST Index

Image
Introduction GiST (Generalized Search Tree) is also a high-order balanced tree, which has a hierarchical structure similar to the traditional B-Tree structure. However, the core difference lies in the storage content inside nodes including Root Node and Internal Nodes: each entry in a branch node contains two pieces of information: Predicate: This is the data area created by Postgres from a general level (at the root node) to a detailed level (at the internal node) Nodes at higher levels will contain a general description for all child nodes below them With such a structure, it helps to eliminate extremely quickly data areas that definitely do not satisfy the condition, instead of having to check each row one by one. TID points to lower-level child nodes. Leaf Nodes contain information Specific actual data of the field value TID points to the row data in the Heap Data type GiST supports well for handling data types without linear order (cannot be sorted from smallest to largest like re...

Using Hash Index

Image
Introduction In this article, we will explore another type of index, which is the Hash index. True to its name, when using this index type, Postgres passes the value to be indexed through a hash function to generate a number ranging from 0 to 4,294,697,295 with a uint32 (4 bytes) type. Index Creation Process As soon as you create a Hash index, Postgres pre-allocates Pages to store the indexes. If the table does not have data yet, Postgres will still create: 1 Metapage : Always located at the first position (Block 0). This page stores the configuration of the index. 1 Bitmap page : Used to manage which pages in the file are currently empty to avoid Overflow pages. 10 default Primary Bucket Pages . Thus, right from the start, your newly created Hash Index file will have a size of 12 Pages x 8KB/Page = 96 KB. If the table already contains data, based on the configuration information about the Ram limit that Postgres can use, it will create the maximum number of Pages (that it can create)...

GIN Index with Array

Image
Introduction GIN GIN (Generalized Inverted Index) also uses a B-tree data structure, so it shares similar characteristics with B-Tree. However, unlike B-tree, which stores each value along with a Tid to a single row in the main table, GIN stores the value along with information to access multiple rows sharing that same value (referred to as a Posting List). Therefore, the difference in usage is that B-tree is used to search for a row containing a specific value, while GIN is used to search for small values nested inside a row. Posting List You can understand it as a list containing IDs, which are primary keys corresponding to each row in the main table. If the table does not have an ID, it will use a physical identifier called ctid (Column Tuple Identifier) created by Postgres, with a structure consisting of (Page number, row index within the Page). Based on this ctid, Postgres can access any row in the main table. However, as you know, data in the heap is arranged arbitrarily and when...

TOAST Storage Strategies

Image
Introduction As mentioned previously regarding storing data into HEAP, Postgres enforces a strict rule where a Row/Tuple must fit entirely within a single Page (8KB) and cannot overflow into another Page. The maximum size of a Row ranges from approximately 2KB to 8KB. If you intentionally insert a very long TEXT value or a file of several MBs into a row, an 8KB Page cannot accommodate such large data, prompting Postgres to trigger a mechanism called TOAST (The Oversized-Attribute Storage Technique). When you insert a data row whose size exceeds the allowed threshold (typically around 2KB), Postgres will not insert the entire row into the main HEAP. It executes the following three steps: Data compression: First, Postgres attempts to compress the oversized data to see if it fits within the 8KB Page. If successful, it is still inserted into the main HEAP. Chunking and moving to TOAST: If the data remains too large after compression, Postgres splits that 5MB data into multiple small chunks...