SP-GiST Index

Introduction

SP-GiST (Space-Partitioned Generalized Search Tree) operates based on the core idea of space partitioning, applied to complex data such as geographical coordinates (longitude, latitude), long text strings or hierarchical structures (directories).

Index Creation

SP-GiST Index has key characteristics including:

Non-overlapping space partitioning: When it divides space (or the dataset) into smaller parts, these parts do not overlap. A data point can only belong to branch A or branch B, not both.
Unbalanced Trees: Unlike B-Tree which is always balanced between branches, SP-GiST allows branches to grow freely. Whichever branch has more complex data will be more detailed, while branches with less data stop early. This helps save memory effectively.

For example, when there is a search keyword list consisting of: sam, sample, samsung, samurai and apple.

If using B-Tree Index, it will store these exact words
If using GIN Index, it will break down each word before storing
If using SP-GiST, the words will be decomposed into characters or character clusters (prefixes) and stored in a tree form as follows

You can see that it has the following characteristics:

No repetitive storage: The word sam is the common part of all 4 words, so it is stored exactly once in the Inner Node.
Leaf nodes: Store the remaining tail and point directly to the row containing the data in the heap.

Advantages

Index size optimization
- Since identical prefixes are only stored once, the index file size of SP-GiST is much smaller than that of B-Tree
- Therefore, it can be loaded into RAM for efficient usage.
Fixed number of search steps
- Search speed does not depend on how much data the table has, but depends on the number of branches to traverse.
- If that branch only has a few nodes, the search will be very fast regardless of how large the data is
No backtracking
- Since data does not overlap, it only needs to be checked within 1 branch
- If not found, it stops immediately without needing to check other branches like GiST

Limitations

Although fast to read and lightweight, when the data is too large, SP-GiST delivers poorer performance than B-Tree when performing data modifications (INSERT/UPDATE/DELETE)

B-Tree is a balanced tree that usually has very few levels
- Therefore, the number of search steps will be fixed and lower
- The cost of merging or splitting nodes is simply merging or splitting data in half without any other logic
SP-GiST is an unbalanced tree
- Therefore, when data is concentrated in a certain branch, it will take many steps to search
- When merging or splitting nodes, it must perform logical calculations based on the prefix of the Inner nodes, so the cost to restructure the tree is very large

Use cases

SP-GiST should be chosen to solve problems and data types such as:

Geometric/mapping data: If data points are divided into distinct clusters that do not overlap with each other, SP-GiST will search faster than regular GiST.
Text Prefix Search: Searching for words starting with a certain phrase, such as the Auto-complete feature.
Permission management and directory structure like in CMS systems
Searching for IP network ranges (such as 192.168.1.0/24) to implement features related to whitelists or blacklists

Comparison between GiST and SP-GiST

Although both are Generalized index structures used for complex data (such as geographical coordinates, geometric shapes, IP ranges), they have completely opposite operational philosophies.

GiST
- Is a balanced tree structure
- Data regions can overlap with each other, so when searching, it may have to check multiple branches
- Used to check for duplicates or find the intersection of data
SP-GiST
- Is not a balanced tree, the more detailed the data in a branch, the more nodes it has
- Data regions are not allowed to overlap, so when searching, it only needs to follow 1 branch
- Helps narrow down the scope accurately and search quickly

Detail

First, create the table and indexes as follows

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    product_name TEXT NOT NULL,
    category VARCHAR(50)
);

-- Index 1
CREATE INDEX idx_spgist_products_name ON products USING spgist (product_name);
-- Index 2
CREATE INDEX idx_spgist_lower_product_name ON products USING spgist (lower(product_name));

Above, I created 2 Indexes, with Index 1 being the default type, which will be case-sensitive when used
Therefore, to use case-insensitive, create an additional Index 2 with the lower function

The query method is as follows

-- Query 1
SELECT * FROM products WHERE product_name LIKE 'OptiMax Vision Chips W%'

-- Query 2
SELECT product_name FROM products WHERE starts_with(product_name, 'InfiniLink Data Shoes C');

-- Query 3
SELECT * FROM products
WHERE product_name >= 'DuraBuilt Heavy Pizza 5' AND product_name < 'DuraBuilt Heavy Pizza 6';

-- Query 4
SELECT * FROM products WHERE product_name ~ '^AlphaShield Safe Chair G';

-- Query 5
SELECT * FROM products WHERE product_name ILIKE 'optimax vision chips w%';

-- Query 6
SELECT * FROM products WHERE lower(product_name) LIKE lower('optimax Vision cHips w%');

Query 1: Uses LIKE and the % operator to search like B-Tree
Query 2: Uses the starts_with function, which is similar to LIKE but more explicit, because the SP-GiST Index stores the raw data of the column, querying only that column will trigger an Index Only Scan.
Query 3: Using normal string comparison can also activate the Index
Query 4: Uses the ~ operator to search with regex content
Query 5: Uses ILIKE to search case-insensitive, but it will not activate the Index because the Index is storing values as case-sensitive
Query 6: Uses LIKE, the lower function combined with the Index idx_spgist_lower_product_name allowing efficient case-insensitive search