[1] https://reddit.com/r/LocalLLaMA/comments/1pw8nfk/nvidia_acqu...
1,000 tok/s sounds impressive but Cerebras has already done 3,000 tok/s on smaller models. so either Codex-Spark is significantly larger/heavier than gpt-oss-120B, or there's overhead from whatever coding-specific architecture they're using. the article doesn't say which.
the part I wish they'd covered: does speed actually help code quality, or just help you generate wrong code faster? with coding agents the bottleneck isn't usually token generation, it's the model getting stuck in loops or making bad architectural decisions. faster inference just means you hit those walls sooner.
A different way to read this might be: Nvidia isn't going to agree to that deal, so we now need to save face by dumping them first"
I imagine sama doesn't like rejection.
They used amd gpus before - MI300X via azure a year plus ago
If so, I have to ask: If you aren’t willing to take the time to write your own work, why should I take the time to read your work?
I didn’t have to worry about this even a week ago.
No, you didn’t realize you had to worry about this until a week ago.
Once you default to 'doesnt matter if true' you end up being a lot more even keeled.
Honestly not sure what else fits that bill. Maybe some crazy radar applications? The amount of memory is awfully small for traditional HPC.
The era of “Personal computing” is over
Large scale Capital is not gonna make any more investments into microelectronics going forward
Capital is incentivized to make large data centers and very high speed private Internet, not public Internet, private Internet like starlink
So the same way in the 1970s it was the main frame era and server side computing, which turned into server side rendering, which then turned into client side rendering which culminated in the era of the private computer in your home and then finally in your pocket
we’re going back to server side model communication and that’s going to encompass effectively the gateway to all other information which will be increasingly compartmentalized into remote data centers and high-speed access