Data Integrity

Introduction

  • This is a term referring to the accuracy, completeness, consistency and reliability of data throughout its lifecycle, from when it is entered, stored, processed until it is deleted.
  • Data with Integrity is data that correctly reflects objective reality and is not distorted, biased or contaminated due to system errors, human errors or hacker destruction.
  • In database management systems like PostgreSQL, Data Integrity acts like strict rules, preventing any behavior that intentionally or unintentionally makes data absurd.

To ensure data is always clean and correct, PostgreSQL provides the following core constraints:

  • Entity Integrity
    • When creating a table with a primary key, using a Unique Constraint or Unique Index means that the values in this column must be unique
    • Ensures that the system can always distinguish between different entities, there is no such thing as two completely identical data rows or an "anonymous" data row existing.
    • If you do not use the above methods, at the Physical Layer, Postgres still has ctid to distinguish between rows, but ctid is not fixed, so it is best to use a primary key for efficiency
    • Example: In the users table, each person has a unique id. There cannot be two users with the same id = 99.
  • Referential Integrity
    • The relationship between tables must be maintained through a Foreign Key. If table A references a row in table B, then that row in table B must exist.
    • To prevent the phenomenon of Orphan Data.
    • Example: In the order_items table, you have the product_id column = 5. Postgres will check the products table to see if there is any product with id 5. If there is not (or product 5 has been deleted), Postgres will immediately block your command and report an error.
  • Domain Integrity
    • Data entered into a column must comply with the correct format, data type and allowable range of that column.
    • To prevent entering junk data or wrong types.
    • Example:
      • The email column must have a valid email structure.
      • The amount column must be an integer (INT) and must CHECK (amount >= 0). If you intentionally enter amount = -5 or amount = 'Fifty', Postgres will report an error immediately.
  • User-Defined Integrity
    • These are separate Business Logic rules designed by the programmer themselves through the use of Functions, Triggers or Stored Procedures.
    • Example: Banking systems all have regulations that money cannot be withdrawn exceeding the account balance. This is separate logic, which needs to be self-implemented to block if the balance is negative.

Contributing Factors

In reality, data integrity is very easily shaken by several factors:

  • Human Error: Entering the wrong phone number, mistakenly deleting rows, updating without a WHERE condition, leading to an Update Anomaly.
  • Transfer Errors: Data is lost during the process of being sent over the Internet to the Database.
  • Hardware/Software Errors: While writing data to disk, the database suddenly shuts down (possibly due to overhead or power outage), causing the data file to be corrupted halfway, Postgres has a mechanism to resolve this using the WAL log file.
  • Cyberattacks: Hackers break in to change data in the database (Insert/Update/Delete).

Solution

  • The structural design method to achieve Data Integrity is to apply Normalization
  • When you separate tables well (achieving 3NF standard), data redundancy will be eliminated and Data Anomalies will be automatically removed
  • At this point, maintaining Data Integrity will become extremely easy because each piece of information is located in only one unique place, modifying one place is correct for the entire system.
  • When the database architecture is standardized, the mathematical rules of table relationships will protect the data for you, programmers do not need to write code to patch errors manually at the application layer.

Detail

For example, please create a table as follows:

CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_variants (
    id BIGSERIAL PRIMARY KEY,
    product_id BIGINT REFERENCES products(id),
    sku VARCHAR(100) UNIQUE NOT NULL,
    variation_name VARCHAR(100) NOT NULL,
    price DECIMAL(12, 2) NOT NULL,
    stock INT NOT NULL CHECK (stock >= 0)
);
  • Entity Integrity: When a table creates a primary key as id, it is a unique value, there cannot be two rows with the same id
  • Referential Integrity: When inserting a record into the product_variants table with product_id as a foreign key, a product with the corresponding id must exist in the products table
  • Domain Integrity:
    • Columns created with the NOT NULL keyword require a value to be entered
    • A column with UNIQUE must be unique within that column
    • The stock column with CHECK (stock >= 0) requires the value to always be greater than or equal to 0

Check with the following queries:

-- Query 1
INSERT INTO products (id, name, description) VALUES (1, 'Product 2', 'Description for product 2')

-- Query 2
INSERT INTO product_variants (product_id, sku, variation_name, price, stock)
VALUES (2, 'IP15P-128-BLK2', 'variation name 2', 999.00, 50);

-- Query 3
INSERT INTO product_variants (product_id, sku, price, stock)
VALUES (2, 'sku', 999.00, 50);

-- Query 4
INSERT INTO product_variants (product_id, sku, variation_name, price, stock)
VALUES (2, 'IP15P-128-BLK2', 'variation name 2', 999.00, 50);

-- Query 5
INSERT INTO product_variants (product_id, sku, variation_name, price, stock)
VALUES (2, 'IP15P-128-BLK2', 'variation name 2', 999.00, -1);
  • Query 1: Cannot insert duplicate ids with existing rows
  • Query 2: Cannot insert records referencing a non-existent record
  • Query 3: Error with the variation_name column which requires NOT NULL
  • Query 4: Error with the sku column which must be UNIQUE
  • Query 5: Error with the stock column which is required to be >= 0





Happy coding!

See more articles here.

Comments

Popular posts from this blog

All Practice Series

Kubernetes Deployment for Zero Downtime

Deploying a NodeJS Server on Google Kubernetes Engine

Sitemap

React Practice Series

Setting up Kubernetes Dashboard with Kind

Monitoring with cAdvisor, Prometheus and Grafana on Docker

Helm for beginer - Deploy nginx to Google Kubernetes Engine

A Handy Guide to Using Dynamic Import in JavaScript

Installing PostgreSQL with Docker