Partial Index
Introduction
- Partial Index is an extremely powerful feature that allows the creation of an index on only a subset of a table's data, instead of indexing the entire table. This subset is defined by a filtering condition in a WHERE clause when creating the index.
- In essence, a Partial Index just adds a condition during index creation, so it can be used with all index types (such as
B-Tree,Hash,GIN,GiST, ...) and supports all data types. - Additionally, a Partial Index can also be used in combination with a
Constraint,Expression IndexandComposite Index.
Advantages
- Space-saving: The index size is much smaller than a Full Index, which saves RAM and disk space.
- Increased Write performance (INSERT/UPDATE/DELETE): When adding or modifying data that does not satisfy the index condition, Postgres does not need to update the index tree.
Disadvantages
- Queries must match the condition: The Postgres
Query Planneronly uses this index if the SELECT query has a WHERE clause that exactly matches or is a subset of the condition defined in the index. - Increased management overhead: If the filtering condition changes over time, for example, you create an index for this year because you need to query data frequently for speed, next year you will need to delete the old index (if it is no longer used much) and manually create an index for the new year.
Use Cases
You should use a Partial Index in the following scenarios:
- Removing Biased Data, which are rows that make up most of the data but are not used frequently
- Therefore, you do not need to create an index for all data, you only need to add an index for data that needs to be used frequently
- For example, if an order status has many values but you only need to query with the status
deliveredto process, then you only need to create an index with this status
- Avoiding indexing
NULLvalues- When storing data, there are many cases where fields carry a null value that does not need to be used
- You can exclude them and only add an index for non-null values
- Optimizing for queries based on time
- Used when your data spans across multiple time periods, but you only frequently query within specific timeframes such as the current month or year
- Then you only need to add an index for the corresponding timeframe, but please remember to delete the indexes of old timeframes
Detail
First, let us create the table and Index as follows
CREATE TABLE tenant_members (
member_id SERIAL PRIMARY KEY,
tenant_id INT NOT NULL,
email VARCHAR(255) NOT NULL,
referral_code VARCHAR(50),
is_default_contact BOOLEAN DEFAULT FALSE,
status VARCHAR(50) NOT NULL,
metadata JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Index 1
CREATE INDEX idx_jsonb_partial
ON tenant_members USING gin (metadata)
WHERE status = 'published';
-- Index 2
CREATE INDEX idx_composite_partial
ON tenant_members (tenant_id, email)
WHERE referral_code IS NOT NULL;
-- Index 3
CREATE INDEX idx_expression_partial
ON tenant_members (DATE_TRUNC('day', created_at AT TIME ZONE 'UTC'))
WHERE created_at >= '2026-01-01 00:00:00+00' AND created_at < '2027-01-01 00:00:00+00';
-- Index 4
CREATE UNIQUE INDEX idx_unique_default_address
ON tenant_members (tenant_id)
WHERE is_default_contact = TRUE;
- Index 1: Create a GIN Partial Index on the JSONB field
- Index 2: Create a Composite Partial Index
- Index 3: Create an Expression Partial Index, this is the type of Index often used for data belonging to frequently used time periods
- Index 4: Create a Unique Partial Index, requiring each tenant to have only 1 default contact
Execute the query as follows
-- Query 1
SELECT metadata FROM tenant_members WHERE status = 'published' AND metadata @> '{"title": "Software Engineer"}';
-- Query 2
SELECT tenant_id, email FROM tenant_members
WHERE referral_code IS NOT NULL AND tenant_id = 3018 AND email = 'edwin_cruickshank4@tenant3018.com';
-- Query 3
SELECT DATE_TRUNC('day', created_at AT TIME ZONE 'UTC') AS registration_day, COUNT(*)
FROM tenant_members
WHERE created_at >= '2026-01-01 00:00:00+00' AND created_at < '2027-01-01 00:00:00+00'
GROUP BY registration_day;
-- Query 4
SELECT tenant_id, email FROM tenant_members WHERE tenant_id = 2 AND is_default_contact = TRUE;
- Query 1: Find
publishedarticles that have atitleofSoftware Engineer - Query 2: Find tenants that have a
referral_code, this is a Composite Partial Index so the columns need to be placed in the correct order as when creating the Index - Query 3: Statistically count the number of tenants created each day
- Query 4: Find the default contact of a tenant
Happy coding!
Comments
Post a Comment