Table Partitioning

Introduction

  • Table Partitioning is a feature that physically divides a large table into smaller tables (called partitions), but logically, your application still sees it as a single table.
  • When creating a table partitioning (parent table), that table is actually just a virtual table containing metadata, it has no corresponding heap file on the hard disk and its storage size is zero.
  • On the other hand, each partition (child table) created is an independent physical table. PostgreSQL will allocate a separate heap file for each child table.

Advantages

  • Query Performance: When querying data, PostgreSQL will activate the Partition Pruning mechanism to skip irrelevant partitions and only scan the necessary partition, reading only the exact heap file of that partition on the disk.
  • Data Lifecycle Management: Data retention is extremely efficient because you do not have to operate on a single massive table, which would cause table locks and generate many WAL files that slow down the system
    • For instance, when deleting data, there is no need to mark each individual row for deletion like the DELETE command.
    • Instead, you only need to detach the specific partition from the main table and then drop it, which means deleting the exact heap file on disk
  • Memory Optimization: Maintenance commands such as VACUUM or ANALYZE will run very quickly on each small partition and indexes of smaller tables can be loaded into RAM more easily, boosting read and write speeds.

Use cases

You should consider carefully when choosing to use partitioning because it increases complexity in design and management and should only be used for cases such as

  • The table size is large enough, such as hundreds of millions of records, making the table storage size on disk larger than about 20GB, which exceeds the system RAM capacity to process efficiently
  • When data can be classified by a time axis or categories
    • Time-series / Logs data: Tables storing system logs, bank transaction history, IoT sensor data... where new data is continuously loaded over time and old data is rarely modified.
    • Data partitioned by region or object: Such as data divided by region (North, South) or status.
  • When the query pattern possesses Locality characteristics
    • If your queries frequently include a WHERE condition tied to a fixed time interval or a specific data region (for example, View this month's revenue report, Find yesterday's log).
    • Note that you should clearly determine the partitioning condition before creation, otherwise, querying with conditions different from the existing partition can make the system even slower because it has to check all current partitions
  • In practice, when starting out, you should also create and use tables normally, only when the table size is large enough to cause performance issues and you see a clear factor that can be divided into partitions to accelerate queries and easily drop old data

Declarative Partitioning

PostgreSQL supports 3 main partitioning strategies

  • Range Partitioning: Dividing by continuous value ranges (usually dates or ID numbers). For example, dividing the logs table by month: logs_2026_01, logs_2026_02...
  • List Partitioning: Dividing by a list of specific, clear values. For example, dividing the customers table by country: customers_usa, customers_japan.
  • Hash Partitioning: Using a Hash algorithm to evenly distribute data into a fixed number of tables. For example, dividing the users table into 4 partitions using hash(user_id) % 4 to reduce write (I/O) load for a single table when no suitable time axis can be found.

Multi-column Partitioning

  • You can completely partition based on multiple columns, but keep in mind that if the table already has a primary key and you need to create a partition on other columns, you must combine all these columns together to form a composite primary key
  • Although Postgres allows it, you should consider carefully when using it because it increases management complexity, you should instead combine Table partitioning with Sub-partitioning, which means dividing into child tables based on one condition, then relying on the next condition to divide that child table into even smaller tables

Table Partitioning and Partial Index

Both are similar in that they are used to query tables with large amounts of data efficiently, but they have different usage conditions as follows

  • Partial Index
    • The table is large but queries only focus on rows with a specific status (accounting for < 10% of the table).
    • Want to accelerate queries but do not want to change the table structure and do not want to complicate management.
  • Table Partitioning
    • Data increases over time and old data needs to be deleted periodically.
    • Want to reduce storage capacity effectively.
    • Need to distribute data according to geographical areas (Regions, Countries).

Detail

First, create the tables as follows

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    email VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) PARTITION BY HASH (user_id);

CREATE TABLE customers (
    customer_id INT NOT NULL,
    name VARCHAR(100),
    country_code VARCHAR(10) NOT NULL
) PARTITION BY LIST (country_code);

CREATE TABLE orders (
    order_id INT NOT NULL,
    country VARCHAR(10) NOT NULL,
    amount DECIMAL(10,2),
    created_at DATE NOT NULL,
    PRIMARY KEY (order_id, created_at, country)
) PARTITION BY RANGE (created_at);

The information above is for the parent table

  • You can see that with the users table, I have specified using Hash Partitioning with the user_id column
  • The customers table uses List Partitioning according to country_code
  • The orders table uses Range Partitioning, the PRIMARY KEY (order_id, created_at, country) part includes the Primary key of this table which is order_id and all columns used to create partitions like created_at, country, adding country is because I will use Sub-partitioning next for this column below

Next is creating partitions for the users table, with 3 partitions, when inserting data, the user information will be randomly distributed into these partitions

CREATE TABLE users_p0 PARTITION OF users FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE users_p1 PARTITION OF users FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE users_p2 PARTITION OF users FOR VALUES WITH (MODULUS 3, REMAINDER 2);

This is the child table for the customers table divided by country zones

CREATE TABLE customers_asia PARTITION OF customers
    FOR VALUES IN ('CH', 'JP', 'KR', 'SG');

CREATE TABLE customers_europe PARTITION OF customers
    FOR VALUES IN ('FR', 'DE', 'UK');

Next is creating Sub-partitioning according to the time frame H1 which is the first half of the year (2026-01-01 to 2026-07-01) and H2 which is the second half of the year (2026-07-01 to 2027-01-01) and partitions by country

CREATE TABLE orders_2026_h1 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-07-01')
    PARTITION BY LIST (country);

CREATE TABLE orders_2026_h2 PARTITION OF orders
    FOR VALUES FROM ('2026-07-01') TO ('2027-01-01')
    PARTITION BY LIST (country);

CREATE TABLE orders_2026_h1_us PARTITION OF orders_2026_h1 FOR VALUES IN ('US');
CREATE TABLE orders_2026_h1_jp PARTITION OF orders_2026_h1 FOR VALUES IN ('JP');
CREATE TABLE orders_2026_h1_default PARTITION OF orders_2026_h1 DEFAULT;

CREATE TABLE orders_2026_h2_us PARTITION OF orders_2026_h2 FOR VALUES IN ('US');
CREATE TABLE orders_2026_h2_jp PARTITION OF orders_2026_h2 FOR VALUES IN ('JP');
CREATE TABLE orders_2026_h2_default PARTITION OF orders_2026_h2 DEFAULT;

You can check partitions and the number of rows created according to the parent table as follows

SELECT relname AS partition_table_name, n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE relname LIKE 'users_%'
ORDER BY relname;


You can verify that partitioning is working as follows

select * from users;
select * from users_p0;
SELECT * FROM users WHERE user_id = 2;
SELECT * FROM users WHERE username = 'user_2';
  • You can see that when querying the users table, it will perform a Seq Scan on all table partitions, whereas when querying with an appropriate condition, it will automatically query exactly into the specific table partition
  • Therefore, please keep this in mind, avoiding partition creation for columns that are not used, which will cause the table partitioning to lose its effect




select * from orders;
select * from orders where country = 'JP' and created_at > '2026-07-01';
select * from customers;
select * from customers where country_code = 'JP';




Happy coding!

See more articles here.

Comments

Popular posts from this blog

All Practice Series

Kubernetes Deployment for Zero Downtime

Deploying a NodeJS Server on Google Kubernetes Engine

Sitemap

React Practice Series

A Handy Guide to Using Dynamic Import in JavaScript

Helm for beginer - Deploy nginx to Google Kubernetes Engine

DevOps Practice Series

Docker Practice Series

Setting up Kubernetes Dashboard with Kind