GiST Index

Introduction

GiST (Generalized Search Tree) is also a high-order balanced tree, which has a hierarchical structure similar to the traditional B-Tree structure. However, the core difference lies in the storage content inside nodes including

Root Node and Internal Nodes: each entry in a branch node contains two pieces of information:
- Predicate: This is the data area created by Postgres from a general level (at the root node) to a detailed level (at the internal node)
  - Nodes at higher levels will contain a general description for all child nodes below them
  - With such a structure, it helps to eliminate extremely quickly data areas that definitely do not satisfy the condition, instead of having to check each row one by one.
- TID points to lower-level child nodes.
Leaf Nodes contain information

Specific actual data of the field value
TID points to the row data in the Heap

Data type

GiST supports well for handling data types without linear order (cannot be sorted from smallest to largest like regular numbers or letters). The most common data types commonly used:

Geometric Types: Includes geometric data types and data points such as:
- point (Point coordinates)
- box (Rectangle)
- line / lseg (Line / Line segment)
- polygon (Polygon)
- circle (Circle)
Range Types: GiST is extremely powerful when handling value intervals, helping to search whether the intervals overlap each other or not.
- int4range, int8range (Integer interval)
- numrange (Real number interval)
- tsrange, tstzrange (Time interval without/with time zone)
- daterange (Date interval)
Network Address Types: inet and cidr used for IP addresses and network ranges
Text data type used for Full-Text Search such as tsvector and tsquery

Advantages

Flexible: Handles difficult data types that B-Tree cannot do.
Supports special operators: Such as intersection (&&), contained in (@>), or finding the nearest neighbor (<->).

Disadvantages

Slower creation/writing speed: The calculation to group data into nodes during INSERT/UPDATE is quite complex.
Large size: Usually consumes more storage space than B-Tree, but still smaller than GIN.

R-Tree

GiST is actually not a fixed data structure type, but rather an indexing framework provided by Postgres. GiST allows developers to custom-define various search tree algorithms based on their data types.
When you use GiST for geometric/spatial data (such as points, lines, polygons, GPS locations), Postgres uses R-Tree (Rectangle Tree) underneath the GiST framework. It differs from B-Tree in the following ways:
- B-Tree (Balanced Tree): Sorts data in a 1D linear order (smaller numbers on the left, larger numbers on the right along a number line).
  - Data dimensions: 1D (Works only on a straight line)
  - Partitioning principle: Based on comparisons like <, =, >
  - Parent nodes: Divide value boundaries (e.g., < 50 and >= 50)
  - Sorting (ASC/DESC): Yes (because it is always clear which element comes before/after)
- R-Tree (Rectangle Tree): Groups data in 2D, 3D, ... spaces using Minimum Bounding Rectangles (MBRs). Smaller rectangles are grouped into larger ones, forming a hierarchical tree structure.
  - Data dimensions: 2D, 3D, N-D (Multi-dimensional space)
  - Partitioning principle: Based on spatial regions/distance (Contains, Overlaps, Nearest)
  - Parent nodes: A large spatial region covering all child regions
  - Sorting (ASC/DESC): No (spatial shapes cannot be sorted in a simple ascending/descending order)

Example

Suppose you have a hotel reservation table where the booking_period column is of type daterange, with the following sample data:

Row 1 (Room A): ['2026-06-01', '2026-06-10')
Row 2 (Room B): ['2026-06-05', '2026-06-15')
Row 3 (Room C): ['2026-11-01', '2026-11-10')
Row 4 (Room D): ['2026-11-08', '2026-11-20')

How It Is Stored in the Index

GiST will not store these rows sequentially end-to-end like B-Tree. It will find time ranges that are close to each other or overlap to group them into a Bounding Range.

Group 1: Row 1 (06-01 to 06-10) and Row 2 (06-05 to 06-15) are close to each other, so they are grouped into Child Node N1 with the widest covering range: ['2026-06-01', '2026-06-15').
Group 2: Row 3 (11-01 to 11-10) and Row 4 (11-08 to 11-20) are close to each other, so they are grouped into Child Node N2 with the widest covering range: ['2026-11-01', '2026-11-20').
Root Node: Groups both N1 and N2 into an overall time range: ['2026-06-01', '2026-11-20').

Structure

The in-memory storage model will look like this:

How It Works

Suppose a guest wants to book a room for the period ['2026-06-08', '2026-06-12'). You write a query to find overlapping schedules using the && (Overlaps) operator:

Start at the Root Node: Check if ['2026-06-08', '2026-06-12') intersects with the overall range ['2026-06-01', '2026-11-20').
If it intersects, move to the next level:
- Check Node N1 ['2026-06-01', '2026-06-15'): Does it intersect with the guest's search period? If yes, traverse down N1.
- Check Node N2 ['2026-11-01', '2026-11-20'): Does it intersect? If no, immediately discard the entire N2 branch (no need to read Row 3 and Row 4).
At Node N1:
- Compare with Row 1: Overlaps, so Row 1 is selected.
- Compare with Row 2: Overlaps, so Row 2 is selected.

Usage

Should choose GiST Index to solve the following cases:

GIS / Map Data (PostGIS): Find points within an area, find the nearest entity (KNN search).
Range Types: Helps to quickly and effectively check the overlapping of time intervals such as overlapping room reservations, overlapping operating time of an object.
Full-text Search: Advanced keyword search in large text segments, although using GIN index is usually faster for reading text, GiST consumes less space and updates faster, so GiST is still used when you need to save disk space or tables have too continuous Update/Insert frequency.

Detail

I will provide an implementation example to avoid overlapping import and export times of trucks in the same warehouse, as well as easily check the schedule of trucks effectively

First, please enable the btree_gist extension, then create the table and GiST Index as follows

CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE warehouse_dock (
    id SERIAL PRIMARY KEY,
    dock_name VARCHAR(50) NOT NULL,
    location VARCHAR(100)
);

CREATE TABLE truck_schedule (
    id SERIAL PRIMARY KEY,
    truck_plate_number VARCHAR(20) NOT NULL,
    warehouse_dock_id INT REFERENCES warehouse_dock(id),
    operating_time tstzrange NOT NULL,
    activity_type VARCHAR(20) CHECK (activity_type IN ('INBOUND', 'OUTBOUND', 'OUT_FOR_DELIVERY')),

    CONSTRAINT check_schedule_overlap EXCLUDE USING gist (
        warehouse_dock_id WITH =,
        operating_time WITH &&
    ) WHERE (activity_type IN ('INBOUND', 'OUTBOUND'))
);

CREATE INDEX idx_truck_schedule_dock_time ON truck_schedule USING gist (warehouse_dock_id, operating_time);

operating_time uses data type tstzrange, this is a time range with timezone, you do not need to configure start time and end time manually and can still use GiST Index effectively
Constraint check_schedule_overlap is used to check not allowing 2 trucks to operate in the same warehouse during a time interval to avoid traffic jams and obstructing circulation during the export/import process, using USING gist means this constraint already has a built-in GiST Index created called Partial Index only for INBOUND and OUTBOUND data sets
Index idx_truck_schedule_dock_time is fully created for all operating_time

When inserting data, Postgres will use the Index check_schedule_overlap from the constraint to check the overlap time effectively to have a corresponding error message

INSERT INTO truck_schedule (truck_plate_number, warehouse_dock_id, operating_time, activity_type)
VALUES (
    '29C-123.45', 
    1, 
    '[2026-06-20 08:00:00+00, 2026-06-20 10:00:00+00)', 
    'INBOUND'
);

Here are some query examples as follows

SELECT truck_plate_number, 
       lower(operating_time) as start_time,
       upper(operating_time) as estimated_arrival_time
FROM truck_schedule
WHERE activity_type = 'OUT_FOR_DELIVERY' AND operating_time @> '2026-01-20 10:00:00+00'::timestamptz;

SELECT truck_plate_number, warehouse_dock_id, activity_type, 
       lower(operating_time) as check_in_time, 
       upper(operating_time) as check_out_time
FROM truck_schedule
WHERE operating_time && tstzrange('2026-06-20 09:00:00+00', '2026-06-20 10:00:00+00');

SELECT 
    wd.dock_name,
    CASE 
        WHEN ts.id IS NOT NULL THEN 'BUSY - Truck ' || ts.truck_plate_number
        ELSE 'AVAILABLE'
    END as status
FROM warehouse_dock wd
LEFT JOIN truck_schedule ts ON wd.id = ts.warehouse_dock_id 
    AND ts.operating_time @> '2026-06-20 09:00:00+00'::timestamptz AND ts.activity_type IN ('INBOUND', 'OUTBOUND')
WHERE wd.id = 1;

SELECT * FROM truck_schedule 
WHERE operating_time && tstzrange('2026-06-17 01:30:00+00', '2026-06-17 02:30:00+00') AND warehouse_dock_id = 1 AND activity_type IN ('INBOUND', 'OUTBOUND');

These 2 queries use index idx_truck_schedule_dock_time

Find the truck currently delivering goods at a specific timestamp
Find all truck schedules that have an overlap time with a time interval

These 2 queries use index check_schedule_overlap

Check the status of a specific warehouse at a time point
Check scheduling conflicts of a specific warehouse within a time interval

To view the data stored in the Index, you can use the following query, assuming you need to view the contents of Page 1:

SELECT * FROM gist_page_opaque_info(get_raw_page('check_schedule_overlap', 1));
SELECT * FROM gist_page_items_bytea(get_raw_page('check_schedule_overlap', 1));

Pay attention to the flags section: {leaf} indicates that this is a leaf node and it will contain data that points directly to the Heap.

Each row in the result contains a CTID, which you can use to access the data in the Heap directly.

SELECT 
    idx.itemoffset,
    idx.ctid,
    ts.warehouse_dock_id,
    ts.operating_time,
    ts.activity_type
FROM gist_page_items_bytea(get_raw_page('check_schedule_overlap', 1)) idx
JOIN truck_schedule ts ON ts.ctid = idx.ctid;