SP-GiST Index
Introduction
SP-GiST (Space-Partitioned Generalized Search Tree) operates based on the core idea of space partitioning, applied to complex data such as geographical coordinates (longitude, latitude), long text strings or hierarchical structures (directories).
Index Creation
SP-GiST Index has key characteristics including:
- Non-overlapping space partitioning: When it divides space (or the dataset) into smaller parts, these parts do not overlap. A data point can only belong to branch A or branch B, not both.
- Unbalanced Trees: Unlike B-Tree which is always balanced between branches, SP-GiST allows branches to grow freely. Whichever branch has more complex data will be more detailed, while branches with less data stop early. This helps save memory effectively.
For example, when there is a search keyword list consisting of: sam, sample, samsung, samurai and apple.
- If using B-Tree Index, it will store these exact words
- If using GIN Index, it will break down each word before storing
- If using SP-GiST, the words will be decomposed into characters or character clusters (prefixes) and stored in a tree form as follows
You can see that it has the following characteristics:
- No repetitive storage: The word sam is the common part of all 4 words, so it is stored exactly once in the Inner Node.
- Leaf nodes: Store the remaining tail and point directly to the row containing the data in the heap.
Advantages
- Index size optimization
- Since identical prefixes are only stored once, the index file size of SP-GiST is much smaller than that of B-Tree
- Therefore, it can be loaded into RAM for efficient usage.
- Fixed number of search steps
- Search speed does not depend on how much data the table has, but depends on the number of branches to traverse.
- If that branch only has a few nodes, the search will be very fast regardless of how large the data is
- No backtracking
- Since data does not overlap, it only needs to be checked within 1 branch
- If not found, it stops immediately without needing to check other branches like GiST
Limitations
Although fast to read and lightweight, when the data is too large, SP-GiST delivers poorer performance than B-Tree when performing data modifications (INSERT/UPDATE/DELETE)
- B-Tree is a balanced tree that usually has very few levels
- Therefore, the number of search steps will be fixed and lower
- The cost of merging or splitting nodes is simply merging or splitting data in half without any other logic
- SP-GiST is an unbalanced tree
- Therefore, when data is concentrated in a certain branch, it will take many steps to search
- When merging or splitting nodes, it must perform logical calculations based on the prefix of the Inner nodes, so the cost to restructure the tree is very large
Use cases
SP-GiST should be chosen to solve problems and data types such as:
- Geometric/mapping data: If data points are divided into distinct clusters that do not overlap with each other, SP-GiST will search faster than regular GiST.
- Text Prefix Search: Searching for words starting with a certain phrase, such as the Auto-complete feature.
- Permission management and directory structure like in CMS systems
- Searching for IP network ranges (such as 192.168.1.0/24) to implement features related to whitelists or blacklists
Comparison between GiST and SP-GiST
Although both are Generalized index structures used for complex data (such as geographical coordinates, geometric shapes, IP ranges), they have completely opposite operational philosophies.
- GiST
- Is a balanced tree structure
- Data regions can overlap with each other, so when searching, it may have to check multiple branches
- Used to check for duplicates or find the intersection of data
- SP-GiST
- Is not a balanced tree, the more detailed the data in a branch, the more nodes it has
- Data regions are not allowed to overlap, so when searching, it only needs to follow 1 branch
- Helps narrow down the scope accurately and search quickly
Detail
First, create the table and indexes as follows
CREATE TABLE products (
id SERIAL PRIMARY KEY,
product_name TEXT NOT NULL,
category VARCHAR(50)
);
-- Index 1
CREATE INDEX idx_spgist_products_name ON products USING spgist (product_name);
-- Index 2
CREATE INDEX idx_spgist_lower_product_name ON products USING spgist (lower(product_name));
- Above, I created 2 Indexes, with Index 1 being the default type, which will be case-sensitive when used
- Therefore, to use case-insensitive, create an additional Index 2 with the
lowerfunction
The query method is as follows
-- Query 1
SELECT * FROM products WHERE product_name LIKE 'OptiMax Vision Chips W%'
-- Query 2
SELECT product_name FROM products WHERE starts_with(product_name, 'InfiniLink Data Shoes C');
-- Query 3
SELECT * FROM products
WHERE product_name >= 'DuraBuilt Heavy Pizza 5' AND product_name < 'DuraBuilt Heavy Pizza 6';
-- Query 4
SELECT * FROM products WHERE product_name ~ '^AlphaShield Safe Chair G';
-- Query 5
SELECT * FROM products WHERE product_name ILIKE 'optimax vision chips w%';
-- Query 6
SELECT * FROM products WHERE lower(product_name) LIKE lower('optimax Vision cHips w%');
- Query 1: Uses
LIKEand the%operator to search like B-Tree - Query 2: Uses the
starts_withfunction, which is similar toLIKEbut more explicit, because the SP-GiST Index stores the raw data of the column, querying only that column will trigger an Index Only Scan. - Query 3: Using normal string comparison can also activate the Index
- Query 4: Uses the
~operator to search with regex content - Query 5: Uses ILIKE to search case-insensitive, but it will not activate the Index because the Index is storing values as case-sensitive
- Query 6: Uses
LIKE, thelowerfunction combined with the Indexidx_spgist_lower_product_nameallowing efficient case-insensitive search
Happy coding!
Comments
Post a Comment