Skip to main content

Daniel Lin is a quick learner, able to grasp new concepts efficiently and effectively. In 2025, he spent 100 days to build his own first enterprise-level project, LZStock, a stock screener that can help investors make better decisions.

View all authors

Streaming Engine to 5k+ Users (8) - Summary

· 8 min read

After a series of rigorous load tests, including smoke tests, a 1.5k Virtual Users (VUs) spike test, and a 5k VUs capacity test, the streaming engine architecture has proven its resilience and performance. Here is a summary of the journey, the bottlenecks discovered, and the solutions implemented.

Summary Chart Summary Chart Dark

Streaming Engine to 5k+ Users (7) - Debugging Long-tail Resource Leaks

· 3 min read

Test Goal

There is a long-tail resource leak issue that may affect the performance. It happened when the server was idle for a while after a load test without restarting the server.

Conclusion

I fixed the long-tail resource leak issue. No more error logs are shown after the load test.

If you are interested in the testing process, please continue reading.

Streaming Engine to 5k+ Users (6) - Micro-optimization

· 2 min read

Test Goal

There is a type conversion issue that may affect the performance. Let's fix it and observe whether the performance is improved or not.

Conclusion

I fixed the type conversion issue. The P99 latency is reduced from 700ms to 450ms roughly and CPU usage is reduced a bit from from 0.8 to 0.5 roughly.

If you are interested in the testing process, please continue reading.

Streaming Engine to 5k+ Users (5) - Performance Tuning

· 3 min read

Test Goal

There is a minor issue in the price channel management that may affect performance when conection rapidly built and destroyed. Let's fix it and observe whether the latency meter could be improved or not.

Conclusion

I updated the price channel management to use a map with sessionID. It is more efficient than the previous array. The P99 latency is reduced from above 1 second to around 800ms.

If you are interested in the testing process, please continue reading.

Streaming Engine to 5k+ Users (4) - Shifting to Capacity Test

· 3 min read

Test Goal

Next step I test the last part of the load test. Let's do the capacity test with 5000 users CCU.

Conclusion

The system can now handle 5000 users CCU within 5 mins and hold for another 5 mins. All of them will continuously receive message through websocket.

If you are interested in the testing process, please continue reading.

Streaming Engine to 5k+ Users (0) - Set up

· 19 min read

Theory rarely aligns perfectly with production reality. I will write a series of articles that walk you through the exact process of how I hunted down a fatal lock contention bottleneck under a massive concurrency spike.

Test Goal

The primary goal of this load test is to prove that the real-time streaming engine remains stable and responsive under massive traffic. The test specifically evaluates the system's ability to seamlessly handle two core user flows simultaneously:

  • Flow 1: Session Creation. Users successfully hitting the REST API to create a monitoring session for their chosen stock tickers.
  • Flow 2: Real-Time Streaming. Users establishing a WebSocket connection using that generated session ID to continuously receive live price updates.

bc11-sub-flow bc11-sub-flow-dark

bc11-stream-flow bc11-stream-flow-dark

Recap

Check how I designed the Price Streaming Engine before the load test (Article: Price Streaming Engine).

Metrics

dashboard overview

To truly prove the architecture's resilience, I looked at specific Grafana charts to track my custom application metrics and OS-level hardware metrics...