Seed Data for PostgreSQL with pg-promise and faker

Introduction

pg-promise is a powerful library for Node.js dedicated to interacting with PostgreSQL databases. This library is built on top of the native pg driver but provides a high-level abstraction layer with a Promise-based architecture, making the source code cleaner, easier to read and more maintainable.

Advantages

Automatically manages connections and transactions safely
Features a built-in powerful SQL formatting system to prevent SQL Injection attacks
Supports handling external SQL files via Query Files and offers exceptionally high performance for bulk data insertion thanks to optimized utilities like pgp.helpers.insert and pgp.helpers.update.

Prerequisites

In the previous article, I provided instructions on using Prisma to seed data, but using ORMs always introduces limitations such as lack of support for all data types as well as advanced database features. Since ORMs are designed to work with various database types, they sacrifice specific functionalities to solve general problems
In this article, I will guide you through using pg-promise and faker to seed data, which offers the following advantages
- It does not use any ORM, so there is no need to synchronize the schema between the ORM and the database before use
- You can apply database-specific features directly before seeding data
- You only need to handle data faking and inserting into the database

Detail

You should install these packages

yarn add pg-promise dotenv

For example, suppose you need to create a database with the following tables

CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE warehouse_dock (
    id SERIAL PRIMARY KEY,
    dock_name VARCHAR(50) NOT NULL,
    location VARCHAR(100)
);

CREATE TABLE truck_schedule (
    id SERIAL PRIMARY KEY,
    truck_plate_number VARCHAR(20) NOT NULL,
    warehouse_dock_id INT REFERENCES warehouse_dock(id),
    operating_time tstzrange NOT NULL,
    activity_type VARCHAR(20) CHECK (activity_type IN ('INBOUND', 'OUTBOUND', 'OUT_FOR_DELIVERY')),

    CONSTRAINT check_schedule_overlap EXCLUDE USING gist (
        warehouse_dock_id WITH =,
        operating_time WITH &&
    ) WHERE (activity_type IN ('INBOUND', 'OUTBOUND'))
);

CREATE INDEX idx_truck_schedule_dock_time ON truck_schedule USING gist (warehouse_dock_id, operating_time);

The schema above utilizes the tstzrange data type, which is not supported in the current Prisma version. If you use TypeORM, it supports this type but remains limited when creating this specific Constraint combined with GIST and Partial Index in Postgres
Therefore, the solution is to create the database schema first and then seed data independently using seed/index.ts with the following content

import {faker} from '@faker-js/faker'
import pgPromise from 'pg-promise'

const pgp = pgPromise()
const db = pgp(process.env.DATABASE_URL!)

const csDock = new pgp.helpers.ColumnSet(['dock_name', 'location'], {
  table: 'warehouse_dock',
})

const csSchedule = new pgp.helpers.ColumnSet
  [
    'truck_plate_number',
    'warehouse_dock_id',
    'operating_time',
    'activity_type',
  ],
  {table: 'truck_schedule'}
)

async function main() {
  try {
    console.log('🧹 Clearing old data...')
    await db.none(
      'TRUNCATE TABLE truck_schedule, warehouse_dock RESTART IDENTITY CASCADE;'
    )

    console.log('🏭 Generating 100 warehouse docks...')
    const dockData = Array.from({length: 100}, () => ({
      dock_name: faker.company.name() + ' Dock',
      location: faker.location.streetAddress(),
    }))

    const queryDock = pgp.helpers.insert(dockData, csDock) + ' RETURNING id'
    const createdDocks = await db.many(queryDock)
    const dockIds = createdDocks.map((dock: {id: number}) => dock.id)
    console.log(`✅ Inserted ${dockIds.length} warehouse docks.`)

    const TOTAL_SCHEDULES = 1_000_000
    const BATCH_SIZE = 50_000
    const STEPS = TOTAL_SCHEDULES / BATCH_SIZE

    const dockTimelines = new Map<number, Date>()
    dockIds.forEach(id => {
      dockTimelines.set(id, new Date('2026-01-20T00:00:00.000Z'))
    })

    console.log(
      `📦 Generating ${TOTAL_SCHEDULES.toLocaleString()} truck schedules via ${STEPS} sequential batches...`
    )

    for (let step = 0; step < STEPS; step++) {
      const scheduleData = Array.from({length: BATCH_SIZE}, () => {
        const randomDockId = faker.helpers.arrayElement(dockIds)
        const currentTimeline = dockTimelines.get(randomDockId)!
        const start = new Date(currentTimeline.getTime())
        const durationHours = faker.number.int({min: 2, max: 5})
        const end = new Date(start.getTime() + durationHours * 60 * 60 * 1000)

        const restMinutes = faker.number.int({min: 15, max: 45})
        const nextStartTimeline = new Date(
          end.getTime() + restMinutes * 60 * 1000
        )

        boxTimelines.set(randomDockId, nextStartTimeline)

        return {
          truck_plate_number: faker.vehicle.vrm(),
          warehouse_dock_id: randomDockId,
          operating_time: `[${start.toISOString()},${end.toISOString()})`,
          activity_type: faker.helpers.arrayElement([
            'INBOUND',
            'OUTBOUND',
            'OUT_FOR_DELIVERY',
          ]),
        }
      })

      const querySchedule =
        pgp.helpers.insert(scheduleData, csSchedule) +
        ' ON CONFLICT ON CONSTRAINT check_schedule_overlap DO NOTHING;'

      await db.none(querySchedule)
      console.log(
        `⚡ Finished Batch ${step + 1}/${STEPS} (${((step + 1) * BATCH_SIZE).toLocaleString()} rows)`
      )
    }
    console.log('🚀 100,000 records seeded successfully without any conflicts!')
  } catch (error) {
    console.error('❌ Seeding Error:', error)
  } finally {
    pgp.end()
  }
}

main()

The code snippet above leverages faker to generate mock data and inserts 100 records into the warehouse_dock table. After retrieving the dockIds, they are used as foreign keys to insert into the truck_schedule table
It inserts 1,000,000 records into the truck_schedule table with a batch size of 50,000 records per iteration

Create a .env file

DATABASE_URL = postgresql://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE_NAME}?schema={SCHEMA_NAME}

Add the script to package.json

{
  "scripts": {
    "db:seed": "ts-node -r dotenv/config seed/index.ts"
  }
}

Consequently, you only need to execute that script in the future to seed data

yarn db:seed
yarn run v1.22.22
$ ts-node -r dotenv/config seed/index.ts
🧹 Clearing old data...
🏭 Generating 100 warehouse docks...
✅ Inserted 100 warehouse docks.
📦 Generating 1,000,000 truck schedules via 20 sequential batches...
⚡ Finished Batch 1/20 (50,000 rows)
⚡ Finished Batch 2/20 (100,000 rows)
⚡ Finished Batch 3/20 (150,000 rows)
⚡ Finished Batch 4/20 (200,000 rows)
⚡ Finished Batch 5/20 (250,000 rows)
⚡ Finished Batch 6/20 (300,000 rows)
⚡ Finished Batch 7/20 (350,000 rows)
⚡ Finished Batch 8/20 (400,000 rows)
⚡ Finished Batch 9/20 (450,000 rows)
⚡ Finished Batch 10/20 (500,000 rows)
⚡ Finished Batch 11/20 (550,000 rows)
⚡ Finished Batch 12/20 (600,000 rows)
⚡ Finished Batch 13/20 (650,000 rows)
⚡ Finished Batch 14/20 (700,000 rows)
⚡ Finished Batch 15/20 (750,000 rows)
⚡ Finished Batch 16/20 (800,000 rows)
⚡ Finished Batch 17/20 (850,000 rows)
⚡ Finished Batch 18/20 (900,000 rows)
⚡ Finished Batch 19/20 (950,000 rows)
⚡ Finished Batch 20/20 (1,000,000 rows)
🚀 100,000 records seeded successfully without any conflicts!
✨  Done in 26.39s.

Happy coding!

See more articles here.