Composite Index
Introduction
A Composite Index is an index type that contains two or more columns on the same table. It includes the following characteristics
- A Composite Index can be created for a maximum of 32 columns
- This is the default configuration according to the
INDEX_MAX_KEYSconstant of Postgres - In practice, you should not create an index with more than 3 to 4 columns because it increases the index size, slowing down
INSERT/UPDATE/DELETEoperations
- This is the default configuration according to the
- The column order is sorted ascendingly by default
- However, unlike a standard index where you can sort ascending or descending at will, when using a Composite Index, you can only sort the columns all ascending, all descending or in the exact order specified at creation time
- For example, when using
ON table (c1, c2)(default is all ascending), then - You can query
ORDER BY c1 ASC, c2 ASC - You can query
ORDER BY c1 DESC, c2 DESC - But the index will not work with the query
ORDER BY c1 ASC, c2 DESC
Operation Order
When using a Composite Index, you must query in the exact order from left to right as specified at creation time for the index to work For example, you create an index like this
CREATE INDEX idx_name ON table (c1, c2, c3)
Then the following queries can activate the index
SELECT * FROM table WHERE c1 = 1;
SELECT * FROM table WHERE c1 = 1 AND c2 = 2;
SELECT * FROM table WHERE c1 = 1 AND c2 = 2 AND c3 = 3;
The index will not work with these queries
SELECT * FROM table WHERE c2 = 2;
SELECT * FROM table WHERE c3 = 3;
To ensure the index operates efficiently, apply the following design principles:
- High Cardinality First: The column with the least duplicate values (such as
user_id,email) should be placed at the first position on the left, while columns with many duplicates (such asgender,status) are placed behind.- Because when a query retrieves records with low duplication first, the next condition only needs to process on that small dataset, which is more efficient
- Conversely, if you retrieve high duplication records first, the next condition will have to process more on this large dataset
- Prioritize Equality Operators: Columns used with equality operations (
=, IS NULL) should stand before columns used with range or comparison operations (<, >, LIKE, BETWEEN).- The explanation is similar to the above because an equality comparison operation usually retrieves a smaller amount of records and processing the range or comparison operation based on that dataset will be more efficient
Detail
First, let us create the table and Index for 3 columns as follows
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
tenant_id INT NOT NULL,
status VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL,
total_amount NUMERIC(10, 2),
customer_email VARCHAR(100)
);
CREATE INDEX idx_orders_tenant_status_date ON orders (tenant_id ASC, status ASC, created_at DESC);
Above, I created the Index explicitly to clearly show the sorting order of the columns, but if you do not use ASC, the columns will also default to ascending order
SELECT tenant_id, status, created_at FROM orders WHERE tenant_id = 5
SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id = 5 AND created_at >= '2026-01-01 00:00:00'
SELECT tenant_id, status, created_at FROM orders WHERE status = 'completed'
SELECT tenant_id, status, created_at FROM orders WHERE created_at >= '2026-01-01 00:00:00'
- Query 1: Comparing with the
tenant_idfield, which is the first column in the Index, so this query works well with the Index - Query 2: The Index will still be activated because it filters by the condition of the
tenant_idcolumn first and once the data is available, it only needs to filter further by the condition ofcreated_at - Query 3: Uses a Seq Scan because filtering by the
statuscolumn does not match the correct order of the Index - Query 4: Similar to the above, filtering by the
created_atcolumn also does not match the order in the Index, so it will use a Seq Scan
Check the queries with ordering as follows
SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY tenant_id ASC, status ASC, created_at DESC;
SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY tenant_id DESC, status DESC, created_at ASC;
SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY tenant_id ASC;
SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY status ASC;
SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY created_at DESC;
- Query 1: The
ORDER BYpart follows the exact order when creating the Index, so it can use anIndex Only Scan - Query 2: The
ORDER BYpart is completely opposite to when creating the Index, Postgres still knows how to handle it by just scanning the Index backwards, so it can still use anIndex Only Scan - Query 3: The
ORDER BY tenant_idpart is in the correct order, so it can still be used with the Index - Query 4: The
ORDER BY statuspart starts with the second column, so Postgres cannot recognize the Index to use it - Query 5: Similarly,
ORDER BY created_atstarts with the third column, so it cannot activate the Index either
When you create an additional new Index, it will be stored independently on disk instead of being reused from existing Indexes
- For example, if you create a Composite Index for
(col1, col2)and another Index for(col1, col2, col3), these are 2 independently stored Indexes - In most cases, the Composite Index
(col1, col2, col3)can handle all queries meant for the Index(col1, col2), so you may consider deleting the unnecessary Index to save storage volume - You can use the following query to view the existing Indexes and their utilization level
SELECT
i.indexrelid::regclass AS index_name,
am.amname AS index_type,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
s.idx_scan AS number_of_queries_used,
s.idx_tup_read AS tuples_read,
s.idx_tup_fetch AS tuples_fetched
FROM pg_index i
JOIN pg_class c ON c.oid = i.indrelid
JOIN pg_class ic ON ic.oid = i.indexrelid
JOIN pg_am am ON ic.relam = am.oid
JOIN pg_stat_user_indexes s ON s.indexrelid = i.indexrelid
WHERE c.relname = 'orders'
ORDER BY pg_relation_size(i.indexrelid) DESC;- index_type: such as
B-Tree,GIN,GiST,BRIN, etc. - index_size: The size of the index formatted to be readable, such as
KB, MB, GB. - number_of_queries_used: This is the most important column. It shows the number of times the Query Planner actually decided to use this index to scan data.
- tuples_read: The number of rows recorded in the index that have been read.
- tuples_fetched: The number of actual rows fetched from the HEAP through the index.
- You can rely on these values to delete Indexes that occupy large capacities (large
index_size) but are not used effectively (lownumber_of_queries_used)
Happy coding!
Comments
Post a Comment