Database Optimization: Eradicating the ORM N+1 Problem

How LZStock prevents database connection exhaustion by utilizing GORM Preloading and Application-Level Joins instead of massive SQL Cartesian products.

TL;DR

Kill the Query Waterfall: Replaced naive $O(N \times M)$ loops with GORM's Preload mechanism, reducing hundreds of sequential, high-latency database hits to a strict, batch-optimized $O(1)$ query flow.
Eliminate Cartesian Product Bloat: Favored application-level joins over massive SQL LEFT JOIN statements to prevent redundant data duplication across the network, shifting the "stitching" CPU workload to horizontally scalable Go application pods.
Hybrid Optimization Strategy: Balanced DB-level sorting via eager-loading closures while establishing a clear fallback rule: utilize SQL INNER JOIN or EXISTS specifically for cases requiring filtering by child attributes that independent queries cannot handle.

The Objective

The convenience of Object-Relational Mappers (ORMs) like GORM comes with a deadly trap: The N+1 Query Problem.

In our core domain, fetching a single Dashboard requires loading its Watchlists, which in turn require loading multiple USCompanies. A naive implementation using a loop triggers 1 query for the dashboard, N queries for the watchlists, and M queries for the companies—potentially executing 50+ sequential queries for a single API call.

Under high concurrency, this instantly exhausts the PostgreSQL connection pool and spikes latency. The objective is to enforce strict Batch Fetching (Eager Loading), reducing the query complexity from $O(N \times M)$ down to a strict, predictable $O(1)$ (exactly 3 queries).

The Mental Model

Instead of executing a loop, the system executes 3 isolated batch queries and stitches the relationships together in the Go application's memory (Application-Level Join).

N+1 Trap Example

Click to reveal the N+1 Disaster in code

GORM (The Naive Loop)
Translated Raw SQL

func BadExample(id uuid.UUID) {
    var dashboard Dashboard
    
    // 1. Load Dashboard (1 Query)
    db.Where("id = ?", id).First(&dashboard)  

    // 2. Load Watchlists (1 Query)
    db.Where("dashboard_id = ?", id).Find(&dashboard.Watchlists)  

    // 3. Load Companies in a loop (N Queries! 💥)
    for i := range dashboard.Watchlists {
        db.Model(&dashboard.Watchlists[i]).
          Association("USCompanyRefs").
          Find(&dashboard.Watchlists[i].USCompanyRefs)
    }
}

-- 1. Dashboard
SELECT * FROM dashboards WHERE id = 'uuid';

-- 2. Watchlists
SELECT * FROM watchlists WHERE dashboard_id = 'uuid';

-- 3. The N+1 Avalanche... (Exhausting the connection pool)
SELECT * FROM watchlist_companies wc JOIN us_companies uc ON ... WHERE wc.watchlist_id = 'watchlist-1';
SELECT * FROM watchlist_companies wc JOIN us_companies uc ON ... WHERE wc.watchlist_id = 'watchlist-2';
SELECT * FROM watchlist_companies wc JOIN us_companies uc ON ... WHERE wc.watchlist_id = 'watchlist-3';
-- ... repeats N times for every single watchlist!

Core Implementation

Below is the highly curated data access logic utilizing GORM's Preload. Notice how we inject custom SQL scoping (like ORDER BY) directly into the eager-loading closures to maintain sorting at the database level.

GORM (Optimized Preload)
Translated Raw SQL

// internal/repository/pg/dashboard_repo.go

func (r *DashboardRepo) GetDashboardWithRelations(ctx context.Context, id uuid.UUID) (*schema.Dashboard, error) {
    var dashboard schema.Dashboard

    // Executes exactly 3 optimized queries using the SQL 'IN' operator
    err := r.db.WithContext(ctx).
        // Query 2: Fetch all Watchlists
        Preload("Watchlists", func(db *gorm.DB) *gorm.DB {
            return db.Order("created_at DESC")
        }).
        // Query 3: Fetch all Companies linked to those Watchlists
        Preload("Watchlists.USCompanyRefs", func(db *gorm.DB) *gorm.DB {
            return db.Order("us_companies.company_name ASC")
        }).
        Where("id = ?", id).
        First(&dashboard).Error

    if err != nil {
        return nil, fmt.Errorf("failed to fetch dashboard relations: %w", err)
    }

    return &dashboard, nil
}

-- Exactly 3 highly predictable, batch-optimized queries:

-- 1. Fetch Dashboard
SELECT * FROM dashboards WHERE id = 'uuid' LIMIT 1;

-- 2. Fetch ALL related Watchlists (Batched)
SELECT * FROM watchlists 
WHERE dashboard_id = 'uuid' 
ORDER BY created_at DESC;

-- 3. Fetch ALL related Companies (Batched via the IN clause!)
SELECT wc.*, uc.* FROM watchlist_companies wc
JOIN us_companies uc ON wc.us_company_id = uc.id
WHERE wc.watchlist_id IN ('watchlist-1', 'watchlist-2', 'watchlist-3', ...) 
ORDER BY uc.company_name ASC;

Edge Cases & Trade-offs

Preload (App-Level Join) vs. SQL JOIN (Cartesian Product): Why did I choose Preload instead of writing a single LEFT JOIN query?
- If a Dashboard has 10 Watchlists, and each has 50 Companies, a single JOIN produces 500 rows. The Dashboard and Watchlist columns are heavily duplicated across all 500 rows, causing massive network bandwidth bloat (The Cartesian Product Effect).
- Preload executes 3 separate queries and prevents this data duplication, shifting the stitching CPU load from the Database to the Go Application—which is horizontally scalable and cheaper.
When JOIN is Mandatory: The fatal flaw of Preload is that the queries are independent. If the business logic requires filtering the parent based on a child's attribute (e.g., "Fetch only Dashboards that contain 'TSLA' in any of their watchlists"), Preload cannot do this. In such edge cases, the architecture strictly mandates falling back to an INNER JOIN or EXISTS subquery to utilize the database's relational engine.

The Outcome

By enforcing strict GORM Preloading strategies across the Repository layer, deeply nested aggregate roots are fully hydrated in a guaranteed maximum of 3 queries, completely neutralizing connection pool exhaustion under heavy WebSocket/API traffic.

The Objective​

The Mental Model​

N+1 Trap Example​

Core Implementation​

Edge Cases & Trade-offs​

The Outcome​