Standard View and Materialized View
Introduction
View
- Also known as
Standard VieworVirtual Table, this is essentially just an SQL query statement saved under a specific name. - It does not store any data on the disk, it only stores the definition (SQL code block) within the system.
- Every time you execute
SELECT * FROM my_view, Postgres runs the underlying SQL definition of that View directly against the base tables. - The data is always up to date, if the source tables change, the View immediately reflects those modifications.
- Performance depends entirely on the original SQL query, if the original query is complex involving multiple table
JOINoperations orAGGREGATEcalculations, the View will run slowly because Postgres must recompute everything from scratch upon every invocation.
Materialized View
- This is a View but it physically stores data on the disk just like a regular table.
- It saves the entire result of the query onto the disk at the moment it is created or refreshed.
- When you execute
SELECT * FROM my_mat_view, Postgres reads the pre-saved data from the disk like a normal table, without needing to re-run complex SQL queries or reconnect the base tables. - It does not update automatically, data is frozen at the time of creation or refreshment. If the base tables change, the
Materialized Viewbecomes stale until you actively execute a command to refresh it. - Query speed is exceptionally fast for complex computations, reports or data aggregation, because the data is pre-calculated, it essentially operates as a caching mechanism at the database layer.
- You can create indexes on a
Materialized Viewto accelerate search speeds, which is something a standard View cannot achieve.
Refresh Methods
Postgres provides the REFRESH MATERIALIZED VIEW command with two main operational mechanisms:
- Refresh with Lock
- Postgres will delete all old data in the View and re-run the original SQL query to load fresh data.
- This command applies an
Exclusive Lockon the Materialized View, during the refresh process which can take from several seconds to a few minutes depending on the volume of data, you cannot executeSELECTstatements on this View, user queries will hang until the refresh completes.
- Concurrent Refresh
- Postgres compares the old data with the new data, then only applies the changes (
INSERT, UPDATE, DELETE) to the View. - Its advantage is that it does not require locking the table, users can still execute
SELECTstatements to read data normally while the system is performing the refresh. - The mandatory condition to use
CONCURRENTLYis that yourMaterialized Viewmust have at least oneUNIQUE INDEXon one column or a group of columns.
- Postgres compares the old data with the new data, then only applies the changes (
Refresh Strategies
Since Postgres does not automatically refresh when the base tables change, you need to set up the execution timing based on your project requirements:
- Scheduled / Cron Job
- This is the most popular approach for reporting systems like dashboards and analytics, where data does not need to be real-time every second and a certain delay is acceptable.
- Use external scheduling tools such as
Linux Cron,Airflowor thepg_cronextension running directly inside Postgres to automatically invoke the refresh command. - Recommended frequencies:
- Every night at 2:00 AM for financial reports or daily summaries.
- Every 15 to 30 minutes for dashboard charts tracking progress.
- Event-Driven / Trigger
- If you want the Materialized View to update immediately or almost immediately when the base tables change, you can use a TRIGGER.
- Write a trigger function attached to the base table, every time the base table encounters an
INSERT, UPDATE, DELETEaction, this trigger will automatically invoke theREFRESHcommand. - This method should only be used when the write frequency to the base table is minimal, if the base table is written to continuously hundreds of times per second, the trigger will force Postgres to refresh constantly, leading to a bottleneck and severely impacting system performance.
- On-Demand
- Provide a
Refreshbutton on the Admin or Dashboard user interface, when the user clicks it, the application sends aREFRESHcommand to the database. - Suitable for heavy reports where users decide for themselves when they need the latest figures for analysis.
- Provide a
Use cases
- Choose
Standard Viewwhen: You need security to restrict access permissions, cleaner code structure, and the data in the base tables changes continuously but you must retrieve accurate real-time data every second. - Choose
Materialized Viewwhen: You need to build complex statistical reports such as total monthly revenue or daily user statistics, this data does not need to be accurate to the exact second, you can utilize an additionalcronjobtoREFRESHit every hour or every night.
Detail
First, create the table as follows:
CREATE TABLE categories (
category_id SERIAL PRIMARY KEY,
category_name VARCHAR(100) NOT NULL
);
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(255) NOT NULL,
price NUMERIC(12, 2) NOT NULL,
category_id INT REFERENCES categories(category_id)
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
product_id INT REFERENCES products(product_id),
quantity INT NOT NULL,
total_amount NUMERIC(12, 2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Then, you create and use a Standard View like this:
CREATE OR REPLACE VIEW view_realtime_category_sales AS
SELECT
c.category_id,
c.category_name,
COUNT(o.order_id) AS total_orders,
SUM(o.quantity) AS total_products_sold,
SUM(o.total_amount) AS total_revenue
FROM categories c
JOIN products p ON c.category_id = p.category_id
JOIN orders o ON p.product_id = o.product_id
GROUP BY c.category_id, c.category_name;
SELECT * FROM view_realtime_category_sales;
When using it, just execute a standard SELECT statement concisely to get the latest data from that View.
And create a Materialized View like this:
CREATE MATERIALIZED VIEW mv_category_sales_analytics AS
SELECT
c.category_id,
c.category_name,
COUNT(o.order_id) AS total_orders,
SUM(o.quantity) AS total_products_sold,
SUM(o.total_amount) AS total_revenue
FROM categories c
JOIN products p ON c.category_id = p.category_id
JOIN orders o ON p.product_id = o.product_id
GROUP BY c.category_id, c.category_name;
CREATE UNIQUE INDEX idx_mv_cat_sales_id ON mv_category_sales_analytics (category_id);
SELECT * FROM mv_category_sales_analytics;
REFRESH MATERIALIZED VIEW mv_category_sales_analytics;
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_category_sales_analytics;
- You need to create a
UNIQUE INDEXto be able to refresh withCONCURRENTLY. - When executing a select query, it will be extremely fast because all data is already saved, but if the source data changes, this View does not change automatically.
- Manual refresh is required when you want to update the View.
Happy coding!
Comments
Post a Comment