TOAST Storage Strategies
Introduction
- As mentioned previously regarding storing data into HEAP, Postgres enforces a strict rule where a Row/Tuple must fit entirely within a single Page (8KB) and cannot overflow into another Page. The maximum size of a Row ranges from approximately 2KB to 8KB.
- If you intentionally insert a very long TEXT value or a file of several MBs into a row, an 8KB Page cannot accommodate such large data, prompting Postgres to trigger a mechanism called TOAST (The Oversized-Attribute Storage Technique).
When you insert a data row whose size exceeds the allowed threshold (typically around 2KB), Postgres will not insert the entire row into the main HEAP. It executes the following three steps:
- Data compression: First, Postgres attempts to compress the oversized data to see if it fits within the 8KB Page. If successful, it is still inserted into the main HEAP.
- Chunking and moving to TOAST: If the data remains too large after compression, Postgres splits that 5MB data into multiple small chunks (each under 2KB). It then stores these chunks in the TOAST area.
- Within the main HEAP table, right at the data cell of that row, Postgres leaves a Pointer containing the information needed to access the full data inside TOAST.
Advantages
TOAST brings significant benefits to the storage architecture of Postgres, including:
- Overcoming the 8KB Page limit: Helps Postgres break the physical boundary of a data block, allowing a single field to scale up to 1GB.
- Optimizing Buffer Cache and Table Scan Speed:
- This is the greatest advantage. When you run a query that does not retrieve large data columns, Postgres only needs to scan the main table. Because large data is stored in TOAST within another table, the main table size remains extremely compact.
- More rows will fit into an 8KB page, which increases the Cache hit ratio and accelerates standard Queries significantly.
- Automatic Data Compression: TOAST defaults to using a compression algorithm (PGLZ or LZ4) before pushing data to the secondary table. This saves substantial disk space.
- Ensuring Data Integrity (ACID): Since files or text reside within the database, performing a
ROLLBACKorCOMMITon this large data synchronizes completely with other data, eliminating the risk of a "Database reports saved but File Server fails to find file" scenario. - Avoiding Large Data Fragmentation on the Main Table: Keeps the main table from bloating during
UPDATEoperations on large data fields.
Disadvantages
These are the limitations of TOAST that should be noted during use:
- High I/O Overhead: When selecting a column containing large data, Postgres must perform two steps: locate the pointer in the main table, then access the TOAST table to retrieve and decompress the fragmented data chunks. This causes Random I/O, noticeably slowing down the query if you abuse
SELECT *. - Risk of Data Bloat: Continuously performing an
UPDATEon a large data field in TOAST writes a new copy to the TOAST table according to the MVCC mechanism of Postgres. If theVACUUMprocess does not clean it up in time, the TOAST table will bloat and degrade system performance. - High CPU Resource Consumption: Compressing upon writing and decompressing upon reading continuously consumes substantial CPU power if these large fields are accessed too frequently.
TOAST strategy
- The TOAST strategy, also known as the Storage Strategy, is the method Postgres uses to control data handling when it exceeds limits, defining 4 strategy types applied by default to each data type in Postgres.
- Although these strategies are already highly optimized for each data type and you rarely need to modify them, understanding how each strategy functions allows you to adjust it according to your needs.
EXTENDED
- This is the most versatile type, chosen for most large data types like TEXT, BYTEA and JSONB. When utilized, Postgres attempts to compress the data first. If the row remains too large after compression, it pushes the excess data to the secondary TOAST table.
- Intended for long text data, configurations or files that compress well but are rarely queried concurrently with basic information.
- Examples include the post content column in a blogs table, the payload_json column in a webhooks table or the debug_trace column of a logs table.
- It helps keep the main table compact. When querying a list of articles by selecting only id, title and author, Postgres scans the main table very quickly because the heavy content field is stored in the TOAST table.
MAIN
- Optimizes read speed for moderate data that is only slightly larger than the TOAST threshold (around 2KB - 3KB) so that Postgres can store it directly in the HEAP after compression. Only when compressed to the maximum limit and the row still fails to fit the 8KB page does it push the data to the TOAST table.
- Examples include biography info for introductions or full addresses.
- Use this strategy when the data in that column is large but must be accessed frequently, making data compression and attempting to store it in the HEAP beneficial for faster queries by avoiding random I/O (the process of querying the TOAST table to fetch data).
EXTERNAL
- Reserved for large data that is already compressed and cannot be compressed further, or when you need to optimize CPU resources and accept higher disk space consumption.
- Examples:
- Storing pre-compressed files like images (JPEG, PNG), Videos (MP4) or compressed files (ZIP, GZ) in BYTEA format.
- Encrypted strings, long Hash strings or JWTs.
- A massive TEXT column where you frequently perform a SUBSTRING, allowing Postgres to slice the string directly on the TOAST table without decompressing the entire data set.
- If EXTENDED is applied to these already optimized and uncompressible data types, Postgres wastes CPU trying to compress them further without reducing size, leading to CPU overload. EXTERNAL should be used to disable compression and store raw data in the TOAST table, maximizing CPU savings.
PLAIN
- Reserved for columns that are guaranteed to be small, have a fixed size or must participate in core processing operations like Indexing and Sorting. These data columns will neither be compressed nor moved to TOAST, meaning if the row data exceeds the Page size (8KB), it cannot be saved and will throw an error.
- Examples include fixed-size data types such as INTEGER, BIGINT, UUID, TIMESTAMP, BOOLEAN or VARCHAR(50).
- Ensures data always resides in the main table, never incurring overhead from the TOAST mechanism, thereby keeping data queries and indexing highly efficient.
Comparing EXTENDED and MAIN
- If you wonder why both EXTENDED and MAIN compress data first and store it in the HEAP if it becomes small enough, otherwise moving it to TOAST, here is how they differ.
- The main distinction lies in the priority order and the actions Postgres performs before saving data to TOAST.
- First, you should know that Postgres has a default configuration parameter called TOAST_TUPLE_THRESHOLD valued at 2KB (2048 bytes). This serves as the threshold where a data row exceeding this size triggers TOAST storage.
- When you INSERT or UPDATE a data row, Postgres checks the total size of that row. If it exceeds the 2KB threshold, Postgres executes a strict 4-step process to compress the row, leading to the decision to store data in either the HEAP or TOAST:
- Step 1: Compress EXTENDED and MAIN columns, then if the row size drops below 2KB, save the row into the HEAP.
- Step 2: If the row remains larger than 2KB after compression, Postgres pushes EXTENDED columns to the TOAST table while keeping MAIN columns in the row, which marks the core difference.
- Step 3: If the row is still larger than 2KB, it pushes EXTERNAL columns to the TOAST table.
- Step 4: At this point, if the EXTENDED and EXTERNAL columns have all been moved to TOAST but the row size still exceeds 2KB, it is forced to move MAIN columns to the TOAST table as well.
Comparing EXTENDED and PLAIN
- If you wonder why PLAIN is used to store data when it throws an error if the size is too large, instead of using EXTENDED to automatically transition to TOAST for safety.
- The answer is to guarantee performance, because columns using PLAIN are intended by users to stay in the HEAP without data compression to ensure indexing and queries run efficiently, meaning its size must remain fixed.
- Postgres deliberately designs a fail-fast mechanism for PLAIN to throw an error if the stored column value somehow exceeds the fixed size, rather than automatically moving it to TOAST like EXTENDED, which prevents silent performance degradation despite avoiding errors.
Detail
First, create a table containing data types that can be stored in TOAST, such as TEXT and Bytea, then insert data into it:
CREATE TABLE test (
id SERIAL PRIMARY KEY,
integer_col INT NULL,
text_col TEXT NULL,
binary_col BYTEA NULL
);
INSERT INTO public.test(integer_col, text_col, binary_col)
VALUES (
1,
$$ your long value here
could be break line $$,
pg_read_binary_file('/path-to-file')
);
Consider the compression algorithms supported by Postgres when handling large column data to make a suitable choice:
- PGLZ (PostgreSQL Lempel–Ziv) is used by default, yielding a volume after compression that is 7%-10% smaller than LZ4, making it suitable when you prioritize disk volume.
- LZ4 (Lempel–Ziv 4) focuses on speed and while the compressed file size is larger, it only consumes about 20% of the CPU processing time for compression and decompression compared to PGLZ, making it ideal when you prioritize high-speed data operations.
To change the decompression algorithm, proceed as follows:
ALTER TABLE test ALTER COLUMN binary_col SET COMPRESSION lz4;
To modify the storage method in a column, do the following, noting that alter table operations only apply to newly inserted data while old data remains unchanged:
ALTER TABLE test ALTER COLUMN text_col SET STORAGE EXTERNAL;
Checking the compression processing results will reveal the differences.
You can check the volume of the main table and the volume of TOAST like this:
SELECT
relname AS table_name,
pg_size_pretty(pg_relation_size(oid)) AS table_volume,
reltoastrelid::regclass AS table_TOAST_name,
pg_size_pretty(pg_relation_size(reltoastrelid)) AS table_TOAST_volume
FROM pg_class
WHERE relname = 'test';
You can also view the length and size of each column to see the differences in data before and after compression:
SELECT
id,
length(text_col) AS text_col_length,
pg_column_size(text_col) AS text_col_size,
length(binary_col) AS binary_col_length,
pg_column_size(binary_col) AS binary_col_size
FROM test;
This is how to use Substring, which operates directly on the HEAP or TOAST:
SELECT id, SUBSTRING(text_col FROM 1 FOR 50) AS preview FROM test;
Happy coding!
Comments
Post a Comment