GiST Index

Introduction

GiST (Generalized Search Tree) is also a high-order balanced tree, which has a hierarchical structure similar to the traditional B-Tree structure. However, the core difference lies in the storage content inside nodes including

  • Root Node and Internal Nodes: each entry in a branch node contains two pieces of information:
    • Predicate: This is the data area created by Postgres from a general level (at the root node) to a detailed level (at the internal node)
      • Nodes at higher levels will contain a general description for all child nodes below them
      • With such a structure, it helps to eliminate extremely quickly data areas that definitely do not satisfy the condition, instead of having to check each row one by one.
    • TID points to lower-level child nodes.
  • Leaf Nodes contain information
    • Specific actual data of the field value
    • TID points to the row data in the Heap

Data type

GiST supports well for handling data types without linear order (cannot be sorted from smallest to largest like regular numbers or letters). The most common data types commonly used:

  • Geometric Types: Includes geometric data types and data points such as:
    • point (Point coordinates)
    • box (Rectangle)
    • line / lseg (Line / Line segment)
    • polygon (Polygon)
    • circle (Circle)
  • Range Types: GiST is extremely powerful when handling value intervals, helping to search whether the intervals overlap each other or not.
    • int4range, int8range (Integer interval)
    • numrange (Real number interval)
    • tsrange, tstzrange (Time interval without/with time zone)
    • daterange (Date interval)
  • Network Address Types: inet and cidr used for IP addresses and network ranges
  • Text data type used for Full-Text Search such as tsvector and tsquery

Advantages

  • Flexible: Handles difficult data types that B-Tree cannot do.
  • Supports special operators: Such as intersection (&&), contained in (@>), or finding the nearest neighbor (<->).

Disadvantages

  • Slower creation/writing speed: The calculation to group data into nodes during INSERT/UPDATE is quite complex.
  • Large size: Usually consumes more storage space than B-Tree, but still smaller than GIN.

Usage

Should choose GiST Index to solve the following cases:

  • GIS / Map Data (PostGIS): Find points within an area, find the nearest entity (KNN search).
  • Range Types: Helps to quickly and effectively check the overlapping of time intervals such as overlapping room reservations, overlapping operating time of an object.
  • Full-text Search: Advanced keyword search in large text segments, although using GIN index is usually faster for reading text, GiST consumes less space and updates faster, so GiST is still used when you need to save disk space or tables have too continuous Update/Insert frequency.

Detail

I will provide an implementation example to avoid overlapping import and export times of trucks in the same warehouse, as well as easily check the schedule of trucks effectively

First, please enable the btree_gist extension, then create the table and GiST Index as follows

CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE warehouse_dock (
    id SERIAL PRIMARY KEY,
    dock_name VARCHAR(50) NOT NULL,
    location VARCHAR(100)
);

CREATE TABLE truck_schedule (
    id SERIAL PRIMARY KEY,
    truck_plate_number VARCHAR(20) NOT NULL,
    warehouse_dock_id INT REFERENCES warehouse_dock(id),
    operating_time tstzrange NOT NULL,
    activity_type VARCHAR(20) CHECK (activity_type IN ('INBOUND', 'OUTBOUND', 'OUT_FOR_DELIVERY')),

    CONSTRAINT check_schedule_overlap EXCLUDE USING gist (
        warehouse_dock_id WITH =,
        operating_time WITH &&
    ) WHERE (activity_type IN ('INBOUND', 'OUTBOUND'))
);

CREATE INDEX idx_truck_schedule_dock_time ON truck_schedule USING gist (warehouse_dock_id, operating_time);
  • operating_time uses data type tstzrange, this is a time range with timezone, you do not need to configure start time and end time manually and can still use GiST Index effectively
  • Constraint check_schedule_overlap is used to check not allowing 2 trucks to operate in the same warehouse during a time interval to avoid traffic jams and obstructing circulation during the export/import process, using USING gist means this constraint already has a built-in GiST Index created called Partial Index only for INBOUND and OUTBOUND data sets
  • Index idx_truck_schedule_dock_time is fully created for all operating_time



When inserting data, Postgres will use the Index check_schedule_overlap from the constraint to check the overlap time effectively to have a corresponding error message

INSERT INTO truck_schedule (truck_plate_number, warehouse_dock_id, operating_time, activity_type)
VALUES (
    '29C-123.45', 
    1, 
    '[2026-06-20 08:00:00+00, 2026-06-20 10:00:00+00)', 
    'INBOUND'
);


Here are some query examples as follows

SELECT truck_plate_number, 
       lower(operating_time) as start_time,
       upper(operating_time) as estimated_arrival_time
FROM truck_schedule
WHERE activity_type = 'OUT_FOR_DELIVERY' AND operating_time @> '2026-01-20 10:00:00+00'::timestamptz;

SELECT truck_plate_number, warehouse_dock_id, activity_type, 
       lower(operating_time) as check_in_time, 
       upper(operating_time) as check_out_time
FROM truck_schedule
WHERE operating_time && tstzrange('2026-06-20 09:00:00+00', '2026-06-20 10:00:00+00');

SELECT 
    wd.dock_name,
    CASE 
        WHEN ts.id IS NOT NULL THEN 'BUSY - Truck ' || ts.truck_plate_number
        ELSE 'AVAILABLE'
    END as status
FROM warehouse_dock wd
LEFT JOIN truck_schedule ts ON wd.id = ts.warehouse_dock_id 
    AND ts.operating_time @> '2026-06-20 09:00:00+00'::timestamptz AND ts.activity_type IN ('INBOUND', 'OUTBOUND')
WHERE wd.id = 1;

SELECT * FROM truck_schedule 
WHERE operating_time && tstzrange('2026-06-17 01:30:00+00', '2026-06-17 02:30:00+00') AND warehouse_dock_id = 1 AND activity_type IN ('INBOUND', 'OUTBOUND');

These 2 queries use index idx_truck_schedule_dock_time

  • Find the truck currently delivering goods at a specific timestamp
  • Find all truck schedules that have an overlap time with a time interval

These 2 queries use index check_schedule_overlap

  • Check the status of a specific warehouse at a time point
  • Check scheduling conflicts of a specific warehouse within a time interval


To view the data stored in the Index, you can use the following query, assuming you need to view the contents of Page 1:

SELECT * FROM gist_page_opaque_info(get_raw_page('check_schedule_overlap', 1));
SELECT * FROM gist_page_items_bytea(get_raw_page('check_schedule_overlap', 1));

Pay attention to the flags section: {leaf} indicates that this is a leaf node and it will contain data that points directly to the Heap.


Each row in the result contains a CTID, which you can use to access the data in the Heap directly.

SELECT 
    idx.itemoffset,
    idx.ctid,
    ts.warehouse_dock_id,
    ts.operating_time,
    ts.activity_type
FROM gist_page_items_bytea(get_raw_page('check_schedule_overlap', 1)) idx
JOIN truck_schedule ts ON ts.ctid = idx.ctid;

Happy coding!

See more articles here.

Comments

Popular posts from this blog

All Practice Series

Kubernetes Deployment for Zero Downtime

Deploying a NodeJS Server on Google Kubernetes Engine

Sitemap

Setting up Kubernetes Dashboard with Kind

React Practice Series

Monitoring with cAdvisor, Prometheus and Grafana on Docker

DevOps Practice Series

Helm for beginer - Deploy nginx to Google Kubernetes Engine

Using Kafka with Docker and NodeJS