- PyPI: https://pypi.org/project/axon-api/
- GitHub: https://github.com/b-is-for-build/axon-api
- Deployment Script: https://github.com/b-is-for-build/axon-api/blob/master/examp...
Axon is a 507-line, pure Python WSGI framework that achieves up to 237MB/s file streaming on $8/month hardware. The key feature is the dynamic bundling of multiple files into a single multipart stream while maintaining bounded memory (<225MB). The implementation saturates CPU before reaching I/O limits.
Technical highlights:
- Pure Python stdlib implementation (no external dependencies)
- HTTP range support for partial content delivery
- Generator-based streaming with constant memory usage
- Request batching via query parameters
- Match statement-based routing (eliminates traversal and probing)
- Built-in sanitization and structured logging
The benchmarking methodology uses fresh Digital Ocean droplets with reproducible wrk tests across different file sizes. All code and deployment scripts are included.
Is this supposed to be a pro?
The multipart streaming workload is inherently expensive. The cost of generating boundaries and constructing headers scales with request count and payload size. The architecture demonstrates efficient resource utilization: bounded memory usage (<225MB) while maximizing CPU throughput.
CPU saturation with bounded memory means performance scales predictably with processing power. On multicore systems, you can leverage multiple processes to effectively utilize all cores. Alternatively, you can distribute the workload horizontally using droplets as cost-efficient instances.
Streaming workload is not inherently expensive. The main work is to bring the bytes of the files to the network card as quick as possible, and nearly no computation needs to be performed.
> The cost of generating boundaries and constructing headers scales with request count and payload size.
The only computation necessary to generate boundaries is to ensure that the chosen boundary does not occur in the content, and it seems that the code does not actually check this, but generates a random UUID4. Boundaries and headers are per-file and could be cached, so they don't scale with the number of requests or payload size.
The "inherently expensive" claim was overstated - it's expensive relative to static file serving, and unavoidable, but you're correct that there are optimization opportunities in the current implementation. I've identified three opportunities to improve the design: boundary generation, content type assignment, and header construction.
One clarification: the dynamic bundling via query parameters limits traditional caching strategies. Since each request can specify different file combinations, the multipart response structure must be generated per-request rather than cached.
Axon also represents a core framework. How you implement caching depends on your specific use case - this becomes business logic rather than request processing logic. Its minimal tooling is intended to be a feature, though, as you have pointed out, it can also be limiting.
Thanks for pointing out the ambiguity.
The metric that actually matters is efficiency of the task, given a hardware constraint. In this context, that's entirely network throughput (streaming ability/hardware, with hardware being constant, you can just compare streaming ability directly).
For a litmus test of the concept, if you rewrote this in C or Rust, would the CPU bottleneck earlier or later? Would the network throughput be closer or further from its bottleneck?
Lower-level languages would certainly offer higher performance. I was hoping to showcase how Python can perform when architecture is restrained. The goal was to show that careful design choices (bounded memory, generator-based streaming) can maintain predictable behavior even when computational resources are exhausted.
So, I did look over the code and the thing that I walked away asking was "isn't this sort of the reason why sendfile(2) was developed?"
For static file serving, sendfile would be the better choice.