I wonder how far down they can scale a diffusion LM? I've been playing with in-browser models, and the speed is painful.
But I wonder how Taalas' product can scale. Making a custom chip for one single tiny model is different than running any model trillions in size for a billion users.
Roughly, 53B transistors for every 8B params. For a 2T param model, you'd need 13 trillion transistor assuming scale is linear. One chip uses 2.5 kW of power? That's 4x H100 GPUs. How does it draw so much power?
If you assume that the frontier model is 1.5 trillion models, you'd need an entire N5 wafer chip to run it.
Very interesting tech for edge inference though. Robots and self driving can make use of these in the distant future if power draw comes down drastically.
I'd take an army of high-school graduate LLMs to build my agentic applications over a couple of genius LLMs any day.
This is a whole new paradigm of AI.
But I wish there were more "let's scale this thing to the skies" experiments from those who actually can afford to scale things to the skies.
It would certainly be nice though if this kind of negative result was published more often instead of leaving people to guess why a seemingly useful innovation wasn't adopted in the end.