Composite Index

Introduction

A Composite Index is an index type that contains two or more columns on the same table. It includes the following characteristics

  • A Composite Index can be created for a maximum of 32 columns
    • This is the default configuration according to the INDEX_MAX_KEYS constant of Postgres
    • In practice, you should not create an index with more than 3 to 4 columns because it increases the index size, slowing down INSERT/UPDATE/DELETE operations
  • The column order is sorted ascendingly by default
    • However, unlike a standard index where you can sort ascending or descending at will, when using a Composite Index, you can only sort the columns all ascending, all descending or in the exact order specified at creation time
    • For example, when using ON table (c1, c2) (default is all ascending), then
      • You can query ORDER BY c1 ASC, c2 ASC
      • You can query ORDER BY c1 DESC, c2 DESC
      • But the index will not work with the query ORDER BY c1 ASC, c2 DESC

Operation Order

When using a Composite Index, you must query in the exact order from left to right as specified at creation time for the index to work For example, you create an index like this

CREATE INDEX idx_name ON table (c1, c2, c3)

Then the following queries can activate the index

SELECT * FROM table WHERE c1 = 1;
SELECT * FROM table WHERE c1 = 1 AND c2 = 2;
SELECT * FROM table WHERE c1 = 1 AND c2 = 2 AND c3 = 3;

The index will not work with these queries

SELECT * FROM table WHERE c2 = 2;
SELECT * FROM table WHERE c3 = 3;

To ensure the index operates efficiently, apply the following design principles:

  • High Cardinality First: The column with the least duplicate values (such as user_id, email) should be placed at the first position on the left, while columns with many duplicates (such as gender, status) are placed behind.
    • Because when a query retrieves records with low duplication first, the next condition only needs to process on that small dataset, which is more efficient
    • Conversely, if you retrieve high duplication records first, the next condition will have to process more on this large dataset
  • Prioritize Equality Operators: Columns used with equality operations (=, IS NULL) should stand before columns used with range or comparison operations (<, >, LIKE, BETWEEN).
    • The explanation is similar to the above because an equality comparison operation usually retrieves a smaller amount of records and processing the range or comparison operation based on that dataset will be more efficient

Detail

First, let us create the table and Index for 3 columns as follows

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    tenant_id INT NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP NOT NULL,
    total_amount NUMERIC(10, 2),
    customer_email VARCHAR(100)
);

CREATE INDEX idx_orders_tenant_status_date ON orders (tenant_id ASC, status ASC, created_at DESC);

Above, I created the Index explicitly to clearly show the sorting order of the columns, but if you do not use ASC, the columns will also default to ascending order


SELECT tenant_id, status, created_at FROM orders WHERE tenant_id = 5

SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id = 5 AND created_at >= '2026-01-01 00:00:00'
  
SELECT tenant_id, status, created_at FROM orders WHERE status = 'completed'

SELECT tenant_id, status, created_at FROM orders WHERE created_at >= '2026-01-01 00:00:00'
  • Query 1: Comparing with the tenant_id field, which is the first column in the Index, so this query works well with the Index
  • Query 2: The Index will still be activated because it filters by the condition of the tenant_id column first and once the data is available, it only needs to filter further by the condition of created_at
  • Query 3: Uses a Seq Scan because filtering by the status column does not match the correct order of the Index
  • Query 4: Similar to the above, filtering by the created_at column also does not match the order in the Index, so it will use a Seq Scan




Check the queries with ordering as follows

SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY tenant_id ASC, status ASC, created_at DESC;

SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY tenant_id DESC, status DESC, created_at ASC;

SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY tenant_id ASC;

SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY status ASC;

SELECT tenant_id, status, created_at FROM orders
WHERE tenant_id > 5 AND created_at >= '2026-01-01 00:00:00'
ORDER BY created_at DESC;
  • Query 1: The ORDER BY part follows the exact order when creating the Index, so it can use an Index Only Scan
  • Query 2: The ORDER BY part is completely opposite to when creating the Index, Postgres still knows how to handle it by just scanning the Index backwards, so it can still use an Index Only Scan
  • Query 3: The ORDER BY tenant_id part is in the correct order, so it can still be used with the Index
  • Query 4: The ORDER BY status part starts with the second column, so Postgres cannot recognize the Index to use it
  • Query 5: Similarly, ORDER BY created_at starts with the third column, so it cannot activate the Index either






When you create an additional new Index, it will be stored independently on disk instead of being reused from existing Indexes

  • For example, if you create a Composite Index for (col1, col2) and another Index for (col1, col2, col3), these are 2 independently stored Indexes
  • In most cases, the Composite Index (col1, col2, col3) can handle all queries meant for the Index (col1, col2), so you may consider deleting the unnecessary Index to save storage volume
  • You can use the following query to view the existing Indexes and their utilization level
SELECT
    i.indexrelid::regclass AS index_name,
    am.amname AS index_type,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    s.idx_scan AS number_of_queries_used,
    s.idx_tup_read AS tuples_read,
    s.idx_tup_fetch AS tuples_fetched
FROM pg_index i
JOIN pg_class c ON c.oid = i.indrelid
JOIN pg_class ic ON ic.oid = i.indexrelid
JOIN pg_am am ON ic.relam = am.oid
JOIN pg_stat_user_indexes s ON s.indexrelid = i.indexrelid
WHERE c.relname = 'orders'
ORDER BY pg_relation_size(i.indexrelid) DESC;

  • index_type: such as B-Tree, GIN, GiST, BRIN, etc.
  • index_size: The size of the index formatted to be readable, such as KB, MB, GB.
  • number_of_queries_used: This is the most important column. It shows the number of times the Query Planner actually decided to use this index to scan data.
  • tuples_read: The number of rows recorded in the index that have been read.
  • tuples_fetched: The number of actual rows fetched from the HEAP through the index.
  • You can rely on these values to delete Indexes that occupy large capacities (large index_size) but are not used effectively (low number_of_queries_used)

Happy coding!

See more articles here.

Comments

Popular posts from this blog

All Practice Series

Kubernetes Deployment for Zero Downtime

Deploying a NodeJS Server on Google Kubernetes Engine

Sitemap

React Practice Series

Setting up Kubernetes Dashboard with Kind

Monitoring with cAdvisor, Prometheus and Grafana on Docker

Helm for beginer - Deploy nginx to Google Kubernetes Engine

A Handy Guide to Using Dynamic Import in JavaScript

Installing PostgreSQL with Docker