Table Partitioning
Introduction
- Table Partitioning is a feature that physically divides a large table into smaller tables (called partitions), but logically, your application still sees it as a single table.
- When creating a table partitioning (parent table), that table is actually just a virtual table containing metadata, it has no corresponding heap file on the hard disk and its storage size is zero.
- On the other hand, each partition (child table) created is an independent physical table. PostgreSQL will allocate a separate heap file for each child table.
Advantages
- Query Performance: When querying data, PostgreSQL will activate the Partition Pruning mechanism to skip irrelevant partitions and only scan the necessary partition, reading only the exact heap file of that partition on the disk.
- Data Lifecycle Management: Data retention is extremely efficient because you do not have to operate on a single massive table, which would cause table locks and generate many WAL files that slow down the system
- For instance, when deleting data, there is no need to mark each individual row for deletion like the DELETE command.
- Instead, you only need to detach the specific partition from the main table and then drop it, which means deleting the exact heap file on disk
- Memory Optimization: Maintenance commands such as VACUUM or ANALYZE will run very quickly on each small partition and indexes of smaller tables can be loaded into RAM more easily, boosting read and write speeds.
Use cases
You should consider carefully when choosing to use partitioning because it increases complexity in design and management and should only be used for cases such as
- The table size is large enough, such as hundreds of millions of records, making the table storage size on disk larger than about 20GB, which exceeds the system RAM capacity to process efficiently
- When data can be classified by a time axis or categories
- Time-series / Logs data: Tables storing system logs, bank transaction history, IoT sensor data... where new data is continuously loaded over time and old data is rarely modified.
- Data partitioned by region or object: Such as data divided by region (North, South) or status.
- When the query pattern possesses Locality characteristics
- If your queries frequently include a WHERE condition tied to a fixed time interval or a specific data region (for example, View this month's revenue report, Find yesterday's log).
- Note that you should clearly determine the partitioning condition before creation, otherwise, querying with conditions different from the existing partition can make the system even slower because it has to check all current partitions
- In practice, when starting out, you should also create and use tables normally, only when the table size is large enough to cause performance issues and you see a clear factor that can be divided into partitions to accelerate queries and easily drop old data
Declarative Partitioning
PostgreSQL supports 3 main partitioning strategies
- Range Partitioning: Dividing by continuous value ranges (usually dates or ID numbers). For example, dividing the logs table by month: logs_2026_01, logs_2026_02...
- List Partitioning: Dividing by a list of specific, clear values. For example, dividing the customers table by country: customers_usa, customers_japan.
- Hash Partitioning: Using a Hash algorithm to evenly distribute data into a fixed number of tables. For example, dividing the users table into 4 partitions using hash(user_id) % 4 to reduce write (I/O) load for a single table when no suitable time axis can be found.
Multi-column Partitioning
- You can completely partition based on multiple columns, but keep in mind that if the table already has a primary key and you need to create a partition on other columns, you must combine all these columns together to form a composite primary key
- Although Postgres allows it, you should consider carefully when using it because it increases management complexity, you should instead combine Table partitioning with Sub-partitioning, which means dividing into child tables based on one condition, then relying on the next condition to divide that child table into even smaller tables
Table Partitioning and Partial Index
Both are similar in that they are used to query tables with large amounts of data efficiently, but they have different usage conditions as follows
- Partial Index
- The table is large but queries only focus on rows with a specific status (accounting for < 10% of the table).
- Want to accelerate queries but do not want to change the table structure and do not want to complicate management.
- Table Partitioning
- Data increases over time and old data needs to be deleted periodically.
- Want to reduce storage capacity effectively.
- Need to distribute data according to geographical areas (Regions, Countries).
Detail
First, create the tables as follows
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) PARTITION BY HASH (user_id);
CREATE TABLE customers (
customer_id INT NOT NULL,
name VARCHAR(100),
country_code VARCHAR(10) NOT NULL
) PARTITION BY LIST (country_code);
CREATE TABLE orders (
order_id INT NOT NULL,
country VARCHAR(10) NOT NULL,
amount DECIMAL(10,2),
created_at DATE NOT NULL,
PRIMARY KEY (order_id, created_at, country)
) PARTITION BY RANGE (created_at);
The information above is for the parent table
- You can see that with the
userstable, I have specified usingHash Partitioningwith theuser_idcolumn - The
customerstable usesList Partitioningaccording tocountry_code - The
orderstable usesRange Partitioning, thePRIMARY KEY (order_id, created_at, country)part includes the Primary key of this table which isorder_idand all columns used to create partitions likecreated_at, country, addingcountryis because I will use Sub-partitioning next for this column below
Next is creating partitions for the users table, with 3 partitions, when inserting data, the user information will be randomly distributed into these partitions
CREATE TABLE users_p0 PARTITION OF users FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE users_p1 PARTITION OF users FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE users_p2 PARTITION OF users FOR VALUES WITH (MODULUS 3, REMAINDER 2);
This is the child table for the customers table divided by country zones
CREATE TABLE customers_asia PARTITION OF customers
FOR VALUES IN ('CH', 'JP', 'KR', 'SG');
CREATE TABLE customers_europe PARTITION OF customers
FOR VALUES IN ('FR', 'DE', 'UK');
Next is creating Sub-partitioning according to the time frame H1 which is the first half of the year (2026-01-01 to 2026-07-01) and H2 which is the second half of the year (2026-07-01 to 2027-01-01) and partitions by country
CREATE TABLE orders_2026_h1 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2026-07-01')
PARTITION BY LIST (country);
CREATE TABLE orders_2026_h2 PARTITION OF orders
FOR VALUES FROM ('2026-07-01') TO ('2027-01-01')
PARTITION BY LIST (country);
CREATE TABLE orders_2026_h1_us PARTITION OF orders_2026_h1 FOR VALUES IN ('US');
CREATE TABLE orders_2026_h1_jp PARTITION OF orders_2026_h1 FOR VALUES IN ('JP');
CREATE TABLE orders_2026_h1_default PARTITION OF orders_2026_h1 DEFAULT;
CREATE TABLE orders_2026_h2_us PARTITION OF orders_2026_h2 FOR VALUES IN ('US');
CREATE TABLE orders_2026_h2_jp PARTITION OF orders_2026_h2 FOR VALUES IN ('JP');
CREATE TABLE orders_2026_h2_default PARTITION OF orders_2026_h2 DEFAULT;
You can check partitions and the number of rows created according to the parent table as follows
SELECT relname AS partition_table_name, n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE relname LIKE 'users_%'
ORDER BY relname;
You can verify that partitioning is working as follows
select * from users;
select * from users_p0;
SELECT * FROM users WHERE user_id = 2;
SELECT * FROM users WHERE username = 'user_2';
- You can see that when querying the
userstable, it will perform a Seq Scan on all table partitions, whereas when querying with an appropriate condition, it will automatically query exactly into the specific table partition - Therefore, please keep this in mind, avoiding partition creation for columns that are not used, which will cause the table partitioning to lose its effect
select * from orders;
select * from orders where country = 'JP' and created_at > '2026-07-01';
select * from customers;
select * from customers where country_code = 'JP';
Happy coding!
Comments
Post a Comment