Skip to main content

Database Optimization: Eradicating the ORM N+1 Problem

How LZStock prevents database connection exhaustion by utilizing GORM Preloading and Application-Level Joins instead of massive SQL Cartesian products.

TL;DR
  • Kill the Query Waterfall: Replaced naive O(N×M)O(N \times M) loops with GORM's Preload mechanism, reducing hundreds of sequential, high-latency database hits to a strict, batch-optimized O(1)O(1) query flow.
  • Eliminate Cartesian Product Bloat: Favored application-level joins over massive SQL LEFT JOIN statements to prevent redundant data duplication across the network, shifting the "stitching" CPU workload to horizontally scalable Go application pods.
  • Hybrid Optimization Strategy: Balanced DB-level sorting via eager-loading closures while establishing a clear fallback rule: utilize SQL INNER JOIN or EXISTS specifically for cases requiring filtering by child attributes that independent queries cannot handle.

The Objective

The convenience of Object-Relational Mappers (ORMs) like GORM comes with a deadly trap: The N+1 Query Problem.

In our core domain, fetching a single Dashboard requires loading its Watchlists, which in turn require loading multiple USCompanies. A naive implementation using a loop triggers 1 query for the dashboard, N queries for the watchlists, and M queries for the companies—potentially executing 50+ sequential queries for a single API call.

Under high concurrency, this instantly exhausts the PostgreSQL connection pool and spikes latency. The objective is to enforce strict Batch Fetching (Eager Loading), reducing the query complexity from O(N×M)O(N \times M) down to a strict, predictable O(1)O(1) (exactly 3 queries).

The Mental Model

Instead of executing a loop, the system executes 3 isolated batch queries and stitches the relationships together in the Go application's memory (Application-Level Join).

N+1 Trap Example

Click to reveal the N+1 Disaster in code
func BadExample(id uuid.UUID) {
var dashboard Dashboard

// 1. Load Dashboard (1 Query)
db.Where("id = ?", id).First(&dashboard)

// 2. Load Watchlists (1 Query)
db.Where("dashboard_id = ?", id).Find(&dashboard.Watchlists)

// 3. Load Companies in a loop (N Queries! 💥)
for i := range dashboard.Watchlists {
db.Model(&dashboard.Watchlists[i]).
Association("USCompanyRefs").
Find(&dashboard.Watchlists[i].USCompanyRefs)
}
}

Core Implementation

Below is the highly curated data access logic utilizing GORM's Preload. Notice how we inject custom SQL scoping (like ORDER BY) directly into the eager-loading closures to maintain sorting at the database level.

// internal/repository/pg/dashboard_repo.go

func (r *DashboardRepo) GetDashboardWithRelations(ctx context.Context, id uuid.UUID) (*schema.Dashboard, error) {
var dashboard schema.Dashboard

// Executes exactly 3 optimized queries using the SQL 'IN' operator
err := r.db.WithContext(ctx).
// Query 2: Fetch all Watchlists
Preload("Watchlists", func(db *gorm.DB) *gorm.DB {
return db.Order("created_at DESC")
}).
// Query 3: Fetch all Companies linked to those Watchlists
Preload("Watchlists.USCompanyRefs", func(db *gorm.DB) *gorm.DB {
return db.Order("us_companies.company_name ASC")
}).
Where("id = ?", id).
First(&dashboard).Error

if err != nil {
return nil, fmt.Errorf("failed to fetch dashboard relations: %w", err)
}

return &dashboard, nil
}

Edge Cases & Trade-offs

  • Preload (App-Level Join) vs. SQL JOIN (Cartesian Product): Why did I choose Preload instead of writing a single LEFT JOIN query?
    • If a Dashboard has 10 Watchlists, and each has 50 Companies, a single JOIN produces 500 rows. The Dashboard and Watchlist columns are heavily duplicated across all 500 rows, causing massive network bandwidth bloat (The Cartesian Product Effect).
    • Preload executes 3 separate queries and prevents this data duplication, shifting the stitching CPU load from the Database to the Go Application—which is horizontally scalable and cheaper.
  • When JOIN is Mandatory: The fatal flaw of Preload is that the queries are independent. If the business logic requires filtering the parent based on a child's attribute (e.g., "Fetch only Dashboards that contain 'TSLA' in any of their watchlists"), Preload cannot do this. In such edge cases, the architecture strictly mandates falling back to an INNER JOIN or EXISTS subquery to utilize the database's relational engine.

The Outcome

By enforcing strict GORM Preloading strategies across the Repository layer, deeply nested aggregate roots are fully hydrated in a guaranteed maximum of 3 queries, completely neutralizing connection pool exhaustion under heavy WebSocket/API traffic.