Streaming Engine to 5k+ Users (1) - Debugging a Race Condition in SessionManager.
Test Goal
To confirm system stability using a smoke test with a small number of VUs.
Conclusion
Spotting a race condition in SessionManager. Fixed it. The system is ready for the larger scale test.
If you are interested in the debugging process, please continue reading.
Smoke Test for Spike Test Round
I reduced the VUs to a small number to start
export const options = {
stages: [
{ duration: '3m', target: 100 }, // Ramp-up: from 0
{ duration: '1m', target: 100 }, // Steady state: observe baseline
{ duration: '30s', target: 150 }, // Spike: Sudden influx of concurrent users
{ duration: '1m', target: 150 }, // Sustained peak: test Lock Contention
{ duration: '10s', target: 0 }, // Recovery: Rapid scale down
],
};
Observations
K6 Test Results ❌
The success rate of HTTP sessions created (via Simple POST) is only 97%.
Performance Degradation ❌
The P99 latency for the registerSession endpoint spiked to nearly 400ms, indicating severe lock contention or that the system is struggling under load.
Close of closed channel ❌
The Go server repeatedly threw a fatal error: panic: close of closed channel.
2026/03/31 19:22:55 close of closed channel
2026/03/31 19:22:55 Stack: File: /build/mods/bc11-market-monitor/useCases/SessionManager.go:70 Func: (*sessionManager).registerSession
File: /build/mods/bc11-market-monitor/useCases/SubscribeStockPrice.go:45 Func: (*SubscribeStockPriceUseCase).Executes
File: /build/mods/bc11-market-monitor/controllers/StockPrice.go:22 Func: (*Controller).StartPriceMonitoring
File: /build/shared/go/protos/bc11_market_monitor/bc11_market_monitor.service.pb.go:464 Func: lzstock/shared/go/protos/bc11_market_monitor._RealTimePriceMonitoringService_StartPriceMonitoring_Handler.func1
File: /build/shared/go/infra/grpcServer/interceptor.go:18 Func: (*Server).UnaryInterceptor
File: /build/shared/go/protos/bc11_market_monitor/bc11_market_monitor.service.pb.go:466 Func: lzstock/shared/go/protos/bc11_market_monitor._RealTimePriceMonitoringService_StartPriceMonitoring_Handler
File: /go/pkg/mod/google.golang.org/[email protected]/server.go:1431 Func: (*Server).processUnaryRPC
File: /go/pkg/mod/google.golang.org/[email protected]/server.go:1842 Func: (*Server).handleStream
File: /go/pkg/mod/google.golang.org/[email protected]/server.go:1061 Func: (*Server).serveStreams.func2.1()
File: /go/pkg/mod/google.golang.org/[email protected]/server.go:1072 Func: (*Server).serveStreams.func2 in goroutine 102
2026/03/31 19:22:55 start price monitoring: Error: register session: Error: close of closed channel
Debugging
Thanks to the clear error logs, I can easily locate the issue. The message indicated the file path, line number, and function name.
...
2026/03/31 19:22:55 Stack:
File: /build/mods/bc11-market-monitor/useCases/SessionManager.go:70
Func: (*sessionManager).registerSession
...
If you are interested in the full stack trace, you can find a repo in LZStock github: stackerr.
One of the entries calling deactivateSession is the registerSession function.
func (sm *sessionManager) registerSession(session *session) (err error) {
sm.mu.Lock()
...
if len(userSessions) >= sm.maxSessionsPerUser {
...
sm.mu.Unlock()
-> sm.deactivateSession(oldestSessionID)
sm.mu.Lock()
...
}
...
sm.mu.Unlock()
}
Inside the deactivateSession function, it is calling close() on a channel that has already been closed will instantly cause a panic.
func (sm *sessionManager) deactivateSession(sessionID string) {
sm.mu.RLock()
session := sm.sessions[sessionID]
sm.mu.RUnlock()
if session == nil {
return
}
// Lock for session modification
session.mu.Lock()
session.IsActive = false
traderID := session.TraderID
// Remove Session Channel from RedisManager
for _, ticker := range session.Tickers {
sm.redisSubManager.RemoveSubscription(ticker, session.PriceStream)
}
->close(session.PriceStream)
session.mu.Unlock()
// Lock for sessionsByUser map modification
sm.mu.Lock()
delete(sm.sessions, sessionID)
if userSessions, exists := sm.sessionsByUser[traderID]; exists {
for i, id := range userSessions {
if id == sessionID {
sm.sessionsByUser[traderID] = append(userSessions[:i], userSessions[i+1:]...)
break
}
}
if len(sm.sessionsByUser[traderID]) == 0 {
delete(sm.sessionsByUser, traderID)
}
}
sm.mu.Unlock()
}
The root cause is there multiple goroutines as entries calling deactivateSession simultaneously. Let's break down the problem step by step.
Root Cause Analysis: The Race Condition
In this case, the issue is most likely generated by multiple goroutines calling deactivateSession simultaneously, leading to a classic Race Condition.
The Callers:
- API Goroutine (registerSession): When a user reaches their maximum allowed concurrent sessions, the system actively kicks out their oldest session.
- Background Goroutine (cleanupExpiredSessions): A cron-like job that periodically scans for and removes timed-out sessions.
- Websocket Connection Close: When a user closes the websocket connection, the system will call deactivateSession to clean up the session.
The Race Condition Flow:
- A specific session (S1) expires.
- The background worker detects S1 is expired and calls deactivateSession("S1").
- At the exact same time, the user logs in again. The API detects the connection limit is reached, targets the oldest session (S1), and also calls deactivateSession("S1").
- Both goroutines bypass the initial if session == nil check because S1 has not been deleted from the map yet.
- Goroutine A acquires the session.mu.Lock(), executes close(session.PriceStream), unlocks, and proceeds.
- Goroutine B immediately acquires the lock and executes close(session.PriceStream) again.
- 💥 Result: panic: close of closed channel.
Solution: Idempotent Deactivation
Instead of trying to prevent these callers from executing simultaneously, I modified deactivateSession to be idempotent—meaning it can be safely called multiple times without causing errors.
func (sm *sessionManager) deactivateSession(sessionID string) {
...
// Acquire Session-level lock
session.mu.Lock()
+ if !session.IsActive {
+ session.mu.Unlock()
+ return // Session was already deactivated by another goroutine
+ }
// Mark as deactivated immediately
session.IsActive = false
traderID := session.TraderID
...
}
Result
K6 Test Results ✅
The success rate of HTTP sessions created is now 100%.
Performance Degradation ✅
The P99 latency for the registerSession endpoint stayed low, indicating that all session creations finished without blocking.
No more close of closed channel ✅
No more close of closed channel panic. Perfect!!
Next: Leaked Redis Pub/Sub Connections
Everything looks solid. But new issues are beginning to surface. I found Redis Pub/Sub connections didn't close after all sessions dropped. Let's debug it.
