Show HN: Market Processing Engine in C++20 (150M orders/SEC)
Author here. This started as a quest to see how fast I could push a single core on an M1 Pro. I built an order matching engine from scratch using C++20. Initially, it did ~100k ops/sec. After a month of optimization, it now does ~156M ops/sec. Key optimizations: - Removed all mutexes (Shard-per-Core architecture). - Custom lock-free SPSC Ring Buffer for thread communication. - Replaced std::map with flat vectors + bitset scanning (using CTZ intrinsics). - Zero-allocation hot path using std::pmr (Polymorphic Memory Resources) on the stack. To prove it handles real markets (not just random numbers), I verified it by replaying captured Binance L3 market data (132M ops/sec). Detailed write-up of the optimization journey here: https://medium.com/@kpiyush8826/how-i-optimized-a-c-matching... Happy to answer questions!