Show HN: Python file streaming 237MB/s on $8/M droplet in 507 lines of stdlib

Quick Links:

- PyPI: https://pypi.org/project/axon-api/

- GitHub: https://github.com/b-is-for-build/axon-api

- Deployment Script: https://github.com/b-is-for-build/axon-api/blob/master/examp...

Axon is a 507-line, pure Python WSGI framework that achieves up to 237MB/s file streaming on $8/month hardware. The key feature is the dynamic bundling of multiple files into a single multipart stream while maintaining bounded memory (<225MB). The implementation saturates CPU before reaching I/O limits.

Technical highlights:

- Pure Python stdlib implementation (no external dependencies)

- HTTP range support for partial content delivery

- Generator-based streaming with constant memory usage

- Request batching via query parameters

- Match statement-based routing (eliminates traversal and probing)

- Built-in sanitization and structured logging

The benchmarking methodology uses fresh Digital Ocean droplets with reproducible wrk tests across different file sizes. All code and deployment scripts are included.

19
13
b_llc
14 hours ago
bellone.com

nomel
·
3 hours ago
·
[ - ]

This appears to be an AI codebase, with AI written replies in the comments here.

SkiFire13
·
10 hours ago
·
[ - ]

> The implementation saturates CPU before reaching I/O limits.

Is this supposed to be a pro?

b_llc
·
10 hours ago
·
[ - ]

Good question! Yes, CPU saturation is the desired behavior here.

The multipart streaming workload is inherently expensive. The cost of generating boundaries and constructing headers scales with request count and payload size. The architecture demonstrates efficient resource utilization: bounded memory usage (<225MB) while maximizing CPU throughput.

CPU saturation with bounded memory means performance scales predictably with processing power. On multicore systems, you can leverage multiple processes to effectively utilize all cores. Alternatively, you can distribute the workload horizontally using droplets as cost-efficient instances.

syntonym2
·
7 hours ago
·
[ - ]

> The multipart streaming workload is inherently expensive.

Streaming workload is not inherently expensive. The main work is to bring the bytes of the files to the network card as quick as possible, and nearly no computation needs to be performed.

> The cost of generating boundaries and constructing headers scales with request count and payload size.

The only computation necessary to generate boundaries is to ensure that the chosen boundary does not occur in the content, and it seems that the code does not actually check this, but generates a random UUID4. Boundaries and headers are per-file and could be cached, so they don't scale with the number of requests or payload size.

b_llc
·
5 hours ago
·
[ - ]

You're right about boundary caching opportunities. The computational cost I'm referring to comes from the construction of the per-request multipart header, and file metadata operations, rather than just boundary generation.

The "inherently expensive" claim was overstated - it's expensive relative to static file serving, and unavoidable, but you're correct that there are optimization opportunities in the current implementation. I've identified three opportunities to improve the design: boundary generation, content type assignment, and header construction.

One clarification: the dynamic bundling via query parameters limits traditional caching strategies. Since each request can specify different file combinations, the multipart response structure must be generated per-request rather than cached.

Axon also represents a core framework. How you implement caching depends on your specific use case - this becomes business logic rather than request processing logic. Its minimal tooling is intended to be a feature, though, as you have pointed out, it can also be limiting.

nomel
·
8 hours ago
·
[ - ]

By I/O limits, do you mean memory size limits? If so, this wording will lead to much confusion, since addressing limits (which is what a memory limit is) is a somewhat unusual use of "I/O limits" which, in a streaming context, most would perceive as a bandwidth limit (either memory or network).

b_llc
·
8 hours ago
·
[ - ]

By I/O limits, I meant network bandwidth and disk throughput limits, not memory capacity.

Thanks for pointing out the ambiguity.

nomel
·
7 hours ago
·
[ - ]

Now I'm more confused. An infinitely efficient system would saturate the network. An infinitely inefficient system would saturate the CPU. " The implementation saturates CPU before reaching I/O limits." is true infinitely inefficient system, but false for an infinitely efficient system. That means it's an undesirable.

The metric that actually matters is efficiency of the task, given a hardware constraint. In this context, that's entirely network throughput (streaming ability/hardware, with hardware being constant, you can just compare streaming ability directly).

For a litmus test of the concept, if you rewrote this in C or Rust, would the CPU bottleneck earlier or later? Would the network throughput be closer or further from its bottleneck?

b_llc
·
7 hours ago
·
[ - ]

You're right - this represents computational duress, not optimal efficiency. The 1 CPU struggles to handle the 50 concurrent user scenario and was chosen to demonstrate worst-case behavior rather than peak performance. I intended to stress test the framework. I did not mean to indicate that CPU saturation is ideal but rather highlight that performance remained predictable even at the limits.

Lower-level languages would certainly offer higher performance. I was hoping to showcase how Python can perform when architecture is restrained. The goal was to show that careful design choices (bounded memory, generator-based streaming) can maintain predictable behavior even when computational resources are exhausted.

·
3 hours ago
·
[ - ]

sc68cal
·
7 hours ago
·
[ - ]

> The implementation saturates CPU before reaching I/O limits

So, I did look over the code and the thing that I walked away asking was "isn't this sort of the reason why sendfile(2) was developed?"

b_llc
·
7 hours ago
·
[ - ]

Axon generates dynamic multipart responses with boundaries and headers to bundle files specified via query parameters. sendfile handles "serve this specific file" but does not handle "bundle these N files into a multipart response."

For static file serving, sendfile would be the better choice.

sc68cal
·
7 hours ago
·
[ - ]

Reading the manpage for sendfile leads me to believe that it can be used for that purpose

b_llc
·
6 hours ago
·
[ - ]

I might be wrong then! I'll make a note to take a look. Thanks for pointing this out.

skyzouwdev
·
11 hours ago
·
[ - ]

[dead]