Partial Index

Introduction

  • Partial Index is an extremely powerful feature that allows the creation of an index on only a subset of a table's data, instead of indexing the entire table. This subset is defined by a filtering condition in a WHERE clause when creating the index.
  • In essence, a Partial Index just adds a condition during index creation, so it can be used with all index types (such as B-Tree, Hash, GIN, GiST, ...) and supports all data types.
  • Additionally, a Partial Index can also be used in combination with a Constraint, Expression Index and Composite Index.

Advantages

  • Space-saving: The index size is much smaller than a Full Index, which saves RAM and disk space.
  • Increased Write performance (INSERT/UPDATE/DELETE): When adding or modifying data that does not satisfy the index condition, Postgres does not need to update the index tree.

Disadvantages

  • Queries must match the condition: The Postgres Query Planner only uses this index if the SELECT query has a WHERE clause that exactly matches or is a subset of the condition defined in the index.
  • Increased management overhead: If the filtering condition changes over time, for example, you create an index for this year because you need to query data frequently for speed, next year you will need to delete the old index (if it is no longer used much) and manually create an index for the new year.

Use Cases

You should use a Partial Index in the following scenarios:

  • Removing Biased Data, which are rows that make up most of the data but are not used frequently
    • Therefore, you do not need to create an index for all data, you only need to add an index for data that needs to be used frequently
    • For example, if an order status has many values but you only need to query with the status delivered to process, then you only need to create an index with this status
  • Avoiding indexing NULL values
    • When storing data, there are many cases where fields carry a null value that does not need to be used
    • You can exclude them and only add an index for non-null values
  • Optimizing for queries based on time
    • Used when your data spans across multiple time periods, but you only frequently query within specific timeframes such as the current month or year
    • Then you only need to add an index for the corresponding timeframe, but please remember to delete the indexes of old timeframes

Detail

First, let us create the table and Index as follows

CREATE TABLE tenant_members (
    member_id SERIAL PRIMARY KEY,
    tenant_id INT NOT NULL,
    email VARCHAR(255) NOT NULL,
    referral_code VARCHAR(50),
    is_default_contact BOOLEAN DEFAULT FALSE,
    status VARCHAR(50) NOT NULL,
    metadata JSONB NOT NULL,                
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Index 1
CREATE INDEX idx_jsonb_partial 
ON tenant_members USING gin (metadata)
WHERE status = 'published';

-- Index 2
CREATE INDEX idx_composite_partial 
ON tenant_members (tenant_id, email)
WHERE referral_code IS NOT NULL;

-- Index 3
CREATE INDEX idx_expression_partial 
ON tenant_members (DATE_TRUNC('day', created_at AT TIME ZONE 'UTC')) 
WHERE created_at >= '2026-01-01 00:00:00+00' AND created_at < '2027-01-01 00:00:00+00';

-- Index 4
CREATE UNIQUE INDEX idx_unique_default_address 
ON tenant_members (tenant_id) 
WHERE is_default_contact = TRUE;
  • Index 1: Create a GIN Partial Index on the JSONB field
  • Index 2: Create a Composite Partial Index
  • Index 3: Create an Expression Partial Index, this is the type of Index often used for data belonging to frequently used time periods
  • Index 4: Create a Unique Partial Index, requiring each tenant to have only 1 default contact


Execute the query as follows

-- Query 1
SELECT metadata FROM tenant_members WHERE status = 'published' AND metadata @> '{"title": "Software Engineer"}';

-- Query 2
SELECT tenant_id, email FROM tenant_members 
WHERE referral_code IS NOT NULL AND tenant_id = 3018 AND email = 'edwin_cruickshank4@tenant3018.com';

-- Query 3
SELECT DATE_TRUNC('day', created_at AT TIME ZONE 'UTC') AS registration_day, COUNT(*)
FROM tenant_members
WHERE created_at >= '2026-01-01 00:00:00+00' AND created_at < '2027-01-01 00:00:00+00'
GROUP BY registration_day;

-- Query 4
SELECT tenant_id, email FROM tenant_members WHERE tenant_id = 2 AND is_default_contact = TRUE;
  • Query 1: Find published articles that have a title of Software Engineer
  • Query 2: Find tenants that have a referral_code, this is a Composite Partial Index so the columns need to be placed in the correct order as when creating the Index
  • Query 3: Statistically count the number of tenants created each day
  • Query 4: Find the default contact of a tenant




Happy coding!

See more articles here.

Comments

Popular posts from this blog

All Practice Series

Kubernetes Deployment for Zero Downtime

Deploying a NodeJS Server on Google Kubernetes Engine

Sitemap

React Practice Series

Setting up Kubernetes Dashboard with Kind

Monitoring with cAdvisor, Prometheus and Grafana on Docker

Helm for beginer - Deploy nginx to Google Kubernetes Engine

A Handy Guide to Using Dynamic Import in JavaScript

Installing PostgreSQL with Docker