On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.
Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.
You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.
That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
Or with somebody else's.
If you don't have strict data residency requirements, and if you aren't doing this at an extremely large scale, doing it on somebody else's hardware makes much more economic sense.
If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size. If you don't have enough requests to keep GPUs properly fed 24/7, those GPUs will end up underutilized.
Sometimes underutilization is okay, if your system needs to be airgapped for example, but that's not an economics discussion any more.
Unlike e.g. video streaming workloads, LLMs can be hosted on the other side of the world from where the user is, and the difference is barely going to be noticeable. This means you can keep GPUs fed by bringing in workloads from other timezones when your cluster would otherwise be idle. Unless you're a large, worldwide organization, that is difficult to do if you're using your own hardware.
Things get a lot more easier at lower quantisation, higher parameter space, and there's a lot of people's whose jobs for AI are "Extract sentiment from text" or "bin into one of these 5 categories" where that's probably fine.
When used with crush/opencode they are close to Claude performance.
Nothing that runs on a 4090 would compete but Deepseek on openrouter is still 25x cheaper than claude
What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap?
[1] https://youtube.com/@digitalspaceport?si=NrZL7MNu80vvAshx
Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.
We can't judge on training cost, that's true.
You are right that we can directly observe the cost of inference for open models.
That's how it supposed to work.
I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign).
Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech.
I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero.
The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now.
These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful.
Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital.
They would feel the same way about using xAI or maybe even Facebook models.
https://sg.finance.yahoo.com/news/airbnb-picks-alibabas-qwen...
2020 - I was a mid level (L5) cloud consultant at AWS with only two years of total AWS experience and that was only at a small startup before then. Yet every customer took my (what in hindsight might not have been the best) advice all of the time without questioning it as long as it met their business goals. Just because I had @amazon.com as my email address.
Late 2023 - I was the subject matter expert in a niche of a niche in AWS that the customer focused on and it was still almost impossible to get someone to listen to a consultant from a shitty third rate consulting company.
2025 - I left the shitty consulting company last year after only a year and now work for one with a much better reputation and I have a better title “staff consultant”. I also play the game and be sure to mention that I’m former “AWS ProServe” when I’m doing introductions. Now people listen to me again.
All tech companies offering free services.
I’m not saying this to insult the technical capabilities of Uber. But it doesn’t have the economics that most tech companies have - high fixed costs and very low marginal costs. Uber has high marginal costs saving a little on inference isn’t going to make a difference.
Obviously, some US brands do not compete on price, but other than maybe Jeep and Tesla, those have a small market penetration.
All the clouds compete on price. Do you really think it is that differentiated? Google, Amazon and Microsoft all offer special deals to sign big companies up and globally too.
Microsoft doesn’t compete on price. Their major competitive advantage is Big Enterprise is already big into Microsoft and it’s much easier to get them to come onto Azure. They compete on price only when it comes to making Windows workloads Bd SQL Server cheaper than running on other providers.
AWS is the default choice for legacy reasons and it definitely has services an offerings that Google doesn’t have. I have never once been on a sales call where the sales person emphasizes that AWS is cheaper.
As far as GCP, they are so bad at evterprise sales, we never really looked at them as serious competition.
Sure AWS will throw credits in for migrations and professional services both internally and for third party partners. But no CFO is going to look at just the short term credits.
And most startups are just doing prompt engineering that will never go anywhere. The big companies will just throw a couple of developers at the feature and add it to their existing business.
Before that I spent 6 years working between 3 companies in health care in a tech lead role. I’m 100% sure that any of those companies would I have immediately questioned my judgment for suggesting DeepSeek if had been a thing.
Absolutely none of them would ever have touched DeepSeek.
If you'd spent anytime working at one for swe you won't have access to popular open source frameworks, let alone Chinese LLMs. The LLM development is mostly occurring through collaborations with the regional LLM businesses or internal labs.
https://www.ecfr.gov/current/title-17/chapter-II/part-240/su...
Note: I am neither a lawyer nor in financial circles, but I do have an interest in the effects of market design and regulation as we get into a more deeply automated space.
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-mode...
Although I did just check what regions AWS bedrock support Deepseek and their govcloud regions do not, so that's a good reason not to use it. Still, on prem on a segmented network, following CMMC, probably permissable
Chinese models generally aren't but DeepSeek specifically is at this point.
Of course you’ll always have exceptions (government, military and etc.), but for private, winner will take it all.
Companies just need to get to the “if” part first. That or they wash their hand by using a reseller that can use whatever it wants under the hood.
I'm not sure if technical people who don't understand this deserve the moniker technical in this context.
American companies chose to manufacturer in China and got all surprised Pikachu when China manufactured copies for themselves.
https://www.reuters.com/world/europe/us-security-agency-spie...
American intelligence has penetrated most information systems and at least as of 10 years ago, was leading all other nations in the level of sophistication and capability. Read Edward Snowden.
Rather, I'd say it speaks more about how deranged the post-snowden/anti-neocon figures have become, from critiquing creeping authoritarianism to functionality acting at the behest of an even more authoritarian regime. The funny thing is that behavior of deflection, moralizing and whataboutism is exactly the kind of behavior nationalists employ, not addressing arguments head on like the so-called "American nationalists".
This is a known-playbook of China and so it's pretty likely that if they aren't already doing this, they will eventually if the models see high adoption.
put another way, how do you propose to tell this subtle nefarious chinese sabotage you baselessly imply to be commonplace from the very real limitations of this technology in the first place?
Please don't engage in political battle here, including singling out a country for this kind of criticism. No matter how right you are or feel you are, it inevitably leads to geopolitical flamewar, which has happened here.
remember when the US bugged EU leader's phones, including Merkel from 2002 to 2013?
Please don't be snarky or condescending in HN comments. From the guidelines: Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
The topic itself, like any topic, is fine to discuss here, but care must be taken to discuss it in a de-escalatory way. The words you use and the way you use them matter.
Most importantly, it's not OK to write "it is however entirely reasonable to assume that the comment I replied to was made entirely in bad faith". That's a swipe and a personal attack that, as the guidelines ask, should be edited out.
If you are getting at the fact that it is sometimes beneficial for adversaries to collaborate (e.g., the prisoner dilemma) then I agree. And indeed, both Norway and Sweden would be completely lost if they declared war on the other tomorrow. But it doesn’t change the fundamental nature of the relationship.
This makes EU countries more reliable partners for cooperation than China. The same goes for the US from an European perspective, and even with everything going on over there it is still not remotely close.
All states are fundamentally adversaries because they have conflicting interests. To your point however, adversaries do indeed cooperate all the time.
Now on the HN frontpage: "Google Antigravity just wiped my hard drive"
Sure going to be hard to distinguish these Chinese models' "intentionally malicious actions"!
And the cherry on top:
- Written from my iPhone 16 Pro Max (Made in China)
Even if China did manage to embed software on the iPhone in Taiwan, it would soon hopefully be wiped since you usually end up updating the OS anyway as soon as you activate it.
There has never been a shred of evidence for security researchers, model analysis, benchmarks, etc that supports this.
It's a complete delusion in every sense.
Well for non-American companies, you have the choice between Chinese models that don't send data home, and American ones that do, with both countries being more or less equally threatening.
I think if Mistral can just stay close enough to the race it will win many customers by not doing anything.
This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it.
It is a kind of cheat on the fair market but at the same time it is also costly to China and its capital costs may become unsustainable before the last players fold.
Where do you think they learnt this trick? Years lurking on HN and this post's comment section wins #1 on the American Hypocrisy chart. Unbelievable that even in the current US people can't recognize when they're looking in the mirror. But I guess you're disincentivized to do so when most of your net worth stems from exactly those companies and those practices.
Do you believe the lack of competition is purely because the products are superior?
US tech is now sort of like the dollar. People/countries outside the US need and want alternatives to hedge against in the event of political uncertainity but cannot do it completely for various reasons including arm twisting by the US govt.
One example is some govts and universities in the EU are trying to get rid of MS products for decades but they are unable to.
Can you really view this as a cheat this when the US is throwing a trillion dollars in support of a supposedly "fair market"?
It's a bit early to have any sort of feelings about it, isn't it? You're speaking in absolutes, but none of this is necessarily 100% true as we don't know their intentions. And judging a group of individuals intention based on what their country seems to want, from the lens of a foreign country, usually doesn't land you with the right interpretation.
I am very curious on your definition and usage of 'fair' there, and whether you would call the LLM etc sector as it stands now, but hypothetically absent deepseek say, a 'fair market'. (If not, why not?)
ESH
We should all be happy about the price of AI coming down.
Seriously though, our leaders are actively throwing everything and the kitchen sink into AI companies - in some vain attempt to become immortal or own even more of the nations wealth beyond what they already do, chasing some kind of neo-tech feudalism. Both are unachievable because they rely on a complex system that they clearly don't understand.
Absurd to say Deepseek is CCP controlled while ignoring the govt connection here
The way I see this, some tech teams in China have figured out that training and tuning LLMs is not that expensive after all and they can do it at a fraction of the cost. So they are doing it to enter a market previously dominated by US only players.
It's also a bit funny that providing free models is probably the most communist thing China has done in a long time.
What I appreciate about the Chinese efforts is that they are being forced to get more intelligence from less hardware, and they are not only releasing their work products but documenting the R&D behind them at least as well as our own closed-source companies do.
A good reason to stir up dumping accusations and anti-China bias would be if they stopped publishing not just the open-source models, but the technical papers that go with them. Until that happens, I think it's better to prefer more charitable explanations for their posture.
It's a very transparent, consistent strategy.
AI is a little different because it has geopolitical implications.
Yet, we (developers, users, human civilization), benefit from that.
So yes, I cherish when Chinese companies release open source LLMs. Be it as it fits their business model (the same way as US companies) or from grants (the same way as a lot of EU-backed projects, e.g. Python, DuckDB, scikit-learn).
And if they did, having a counterweight against corrupt self-centered US oligarchs/CEOs is actually one of the biggest proponents for an actual powerful communist or other model world power. The US had some of the most progressive tax policies in its existence when it was under existential threat during the height of the USSR, and when their powered started to diminish, so too did those tax policies.
> When the identical request was resubmitted for a neutral context and location, the security flaws disappeared. Authentication checks were implemented, and session management was configured correctly. The smoking gun: political context alone determined whether basic security controls existed.
Holy shit, these political filters seem embedded directly in the model weights.
I don't know if I trust China or X less in this regard.
Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks.
You already have agents, that can do a lot of "thinking", which is just generating guided context, then using that context to do tasks.
You already have Vector Databases that are used as context stores with information retrieval.
Fundamentally, you can have the same exact performance on a lot of task whether all the information exists in the model, or you use a smaller model with a bunch of context around it for guidance.
So instead of wasting energy and time encoding the knowledge information into the model, making the size large, you could have an "agent-first" model along with just files of vector databases, and the model can fit in a single graphics cards, take the question, decide which vector db it wants to load, and then essentially answer the question in the same way. At $50 per TB from SSD not only do you gain massive cost efficiency, but you also gain the ability to run a lot more inference cheaper, which can be used for refining things, background processing, and so on.
In any case, models are useful, even when they don't hit these efficiency targets you are projecting. Just like cars are useful, even when they are bigger than a pack of cards.
We're around 35-40 orders of magnitude from computers now to computronium.
We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today.
If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains.
Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise.
Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things.
What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?
NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like.
Well, these days people have the equivalent of a frontier lab from perhaps 40 years ago in their pocket. We can see what that has done to the economy, and try to extrapolate.
Intel struggled for a decade, and folks think that means Moore's law died. But TSMC and Samsung just kept iterating. And hopefully Intel's 18a process will see them back in the game.
I suspect many people conflated Dennard scaling with Moore's law and the demise of Dennard scaling is what contributes to the popular imagination that Moore's law is dead: frequencies of processors have essentially stagnated.
The current models are simply inefficient for their capability in how they handle data.
if you base your life on Kurzweil's hard predictions you're going to have a bad time
Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).
FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].
[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...
[2] https://huggingface.co/google/gemma-3n-E4B-it
[3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...
No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.
Infrastructure owners with access to the cheapest energy will be the long run winners in AI.
According to Google (or someone at Google) no organization has moat on AI/LLM [1]. But that does not mean that it is not hugely profitable providing it as SaaS even you don't own the model or Model as a Service (MaaS). The extreme example is Amazon providing MongoDB API and services. Sure they have their own proprietary DynamoDB but for the most people scale up MongoDB is more than suffice. Regardless brand or type of databases being used, you paid tons of money to Amazon anyway to be at scale.
Not everyone has the resource to host a SOTA AI model. On top of tangible data-intensive resources, they are other intangible considerations. Just think how many company or people host their own email server now although the resources needed are far less than hosting an AI/LLM model?
Google came up with the game changing transformer at its backyard and OpenAI temporarily stole the show with the well executed RLHF based system of ChatGPT. Now the paid users are swinging back to Google with its arguably more superior offering. Even Google now put AI summary as its top most search return results for free to all, higher than its paid advertisement clients.
[1]Google “We have no moat, and neither does OpenAI”:
Improvements seem incremental and smaller. For all I care, I could still happily use sonnet 3.5.
Quality was rarely the reason open source lagged in certain domains. Most of the time, open source solutions were technically superior. What actually hurt open source were structural forces, distribution advantages, and enterprise biases.
One could make an argument that open source solutions often lacked good UX historically, although that has changed drastically the past 20 years.
Why would a company use an expensive proprietary model on Vertex AI, for example, when they could use an open-source one on Vertex AI that is just as reliable for a fraction of the cost?
I think you are getting at the idea of branding, but branding is different from security or reliability.
and then we have to look at responsiveness, if the two models are qualitatively in the same ballpark, which one runs faster?
Which is an interesting point in favour of the human employee, as you can only consolidate scape goats so far up the chain before saying "It was AIs fault" just looks like negligence.
Better (UX / ease of use)
Lock in (walled garden type thing)
Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name)
Not Google, and not Amazon. Microsoft is a maybe.
All they have to do is completely switch the google homepage to gemini one day.
In the developed world. I'm not sure about globally.
ChatGPT feels better to use, has the best implementation of memory, and is the best at learning your preferences for the style and detail of answers.
For a sufficiently low cost to orbit that may well be found in space, giving Musk a rather large lead. By his posts he's currently obsessed with building AI satellite factories on the moon, the better to climb the Kardashev scale.
Earth based computers benefit from the existence of an atmosphere to pull cold air in from and send hot air out to.
A space data center would need to entirely rely on city sized heat sink fins.
And aluminum is abundant in the lunar crust.
If launch costs are cheap enough, you can bring aluminum up from earth.
But once your in-space economy is developed enough, you might want to tap the moon or asteroids for resources.
That was fun to write but also I am generally on board with humanity pushing robotics further into space.
I don't think an orbital AI datacentre makes much sense as your chips will be obsolete so quickly that the capex getting it all up there will be better spent on buying the next chips to deploy on earth.
Radiative cooling is really annoying, but it's also an engineering problem with a straightforward solution, if mass-in-orbit becomes cheap enough.
The main reason I see for having datacentres in orbit would be if power in orbit becomes a lot cheaper than power on earth. Cheap enough to make up for the more expensive cooling and cheap enough to make up for the launch costs.
Otherwise, manufacturing in orbit might make sense for certain products. I heard there's some optical fibres with superior properties that you can only make in near zero g.
I don't see a sane way to beam power from space to earth directly.
Google would love a cheap hq model on its surfaces. That just helps Google.
All the facts say otherwise to your thoughts here.
They won't. Actually, even if open models aren't competitive, they still won't. Hasn't this been clear since a while already?
There's no moat in models, investments in pure models has only been to chase AGI, all other investment (the majority, from Google, Amazon, etc.) has been on products using LLMs, not models themselves.
This is not like the gold rush where the ones who made good money were the ones selling shovels, it's another kind of gold rush where you make money selling shovels but the gold itself is actually worthless.
It ignores the reality of humans having memetic emotions, habits, affinities, differentiated use cases & social signaling needs, and the desire to always want to do more...constantly adding more layers of abstraction in fractal ways that evolve into bigger or more niche things.
5 years ago humans didn't know a desire for gaming GPUs would turn into AI. Now it's the fastest growing market.
Ask yourself: how did Google Search continue to make money after Bing's search results started benchmarking just as good?
Or: how did Apple continue to make money after Android opened up the market to commoditize mobile computing?
Etc. Etc.
hopefully they won't
and their titanic off-balance sheet investments will bankrupt them as they won't be able to produce any revenue
The best innovation always happens before being tainted by investment.
However in another way the US probably offers more free inference than China. What good is an open 600 billion parameter model to a poor person? A free account with ChatGPT might be more useful to them, though also more exploitative.
Greed and "safety" hysteria.
At first, I thought they had gone the route of implementing yet another chat format that can handle more dynamic conversations like that, instead of just using Harmony, but looking at the syntax, doesn't it look exactly like Harmony? That's a good thing, don't get me wrong, but why not mention straight up that they've implemented Harmony, so people can already understand up front that it's compatible with whatever parsing we're using for GPT-OSS?
Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further
IIRC the 512GB mac studio is about $10k
(commentary: things are really moving too fast for the layperson to keep up)
Typically, input gets routed to a number of of experts eg. top 2, leaving the others inactive. This reduces number of activation / processing requirements.
Mistral is an example of a model that's designed like this. Clever people created converters to transform dense models to MOE models. These days many popular models are also available in MOE configuration
https://www.youtube.com/watch?v=zwHqO1mnMsA
I wonder how well the aftermarket memory surgery business on consumer GPUs is doing.
I want one. Hot air blows.
This will absolutely scar, if not char, your cornea faster than you can blink.
I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like?
Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy.
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.
You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.
> https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
I don't see any large base models there. A base model is a pretrained foundation model without fine tuning. It just predicts text.
> Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
A B200 is probably not enough: it has just 192 GB RAM while DeepSeek-V3.2-Exp-Base, the base model for DeepSeek-V3.2, has 685 billion BF16 parameters. Though I assume they have larger options. The problem is that all the configuration work is then left to the user, which I'm not experienced in.
> https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models
Thanks. They do indeed have a single base model: Llama 3.1 405B BASE. This one is a bit older (July 2024) and probably not as good as the base model for the new DeepSeek release. But that might the the best one can do, as there don't seem to be any inference providers which have deployed a DeepSeek or even Kimi base model.
https://openrouter.ai/deepseek/deepseek-v3.2
This only bolsters your point. Will be interesting to see if this changes as the model is adopted more widely.
There are consumer-ish hardware that can run large models like DeepSeek 3.x slowly. If you're using LLMs for a specific purpose that is well-served by a particular model, you don't want to risk AI companies deprecating it in a couple months and push you to a newer model (that may or may not work better in your situation).
And even if the AI service providers nominally use the same model, you might have cases where reproducibility requires you use the same inference software or even hardware to maintain high reproducibility of the results.
If you're just using OpenAI or Anthropic you just don't get that level of control.
I feel like private cloud instances that run on demand is still in the spirit of consumer hobbyist. It's not as good as having it all local, but the bootstrapping cost plus electricity to run seems prohibitive.
I'm really interested to see if there's a space for consumer TPUs that satisfy usecases like this.
I don't know, but GPT seems to have regressed a lot, at least the free version.
"create me a svg of a pelican riding on a bicycle"
I wonder how long it will take and how quickly it will try to get shut down.
It refuses to tell me when asked, only that it's been train with data up until July 2024, which would make it quite old. I turned off search and asked it for the winner of the US 2024 election, and it said it didn't know, so I guess that confirms it's not a recent model.
The pdf describes how they did "continued pre-training" and then post training to make 3.2. I guess what's missing is the full pre-training that absorbs most date sensitive knowledge. That's probably also the reason that the versions are 3.x still.
1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.
2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.
On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/
You can notice that, while Chinese models are quite good, the gap to the top is still significant.
However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).
But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet.
This is what people like Altman want investors to believe. It seems like any other snake oil scam because it doesn't match reality of what he delivers.
Nano Banano Pro starts at 15 cents per image at <2k resolution, and is not strictly better than Seedream 4.0: yet the latter does 4K for 3 cents per image.
Add in the power of fine-tuning on their open weight models and I don't know if China actually needs to catch up.
I finetuned Qwen Image on 200 generations from Seedream 4.0 that were cleaned up with Nano Banana Pro, and got results that were as good and more reliable than either model could achieve otherwise.
It is very good for the size and speed, and I'm excited for the Edit and Base variants... but Reddit has been a bit "over-excited" because it run on their small GPUs and isn't overly resistant to porn.
Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese.
Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text.
Most of AI-generated videos we see on social media now are made with Chinese models.
The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today?
I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past)
• For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version. Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding
• The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher.
By the way, you can look at the Table tab to see all known and predicted results on benchmarks.
The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.
Edit: It looks like Cerebras is offering a very fast GLM 4.6
It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front.
Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work.
Opus 4.5 = ~60-80tps https://openrouter.ai/anthropic/claude-opus-4.5
Kimi-k2-think = ~60-180tps https://openrouter.ai/moonshotai/kimi-k2-thinking
Deepseek-v3.2 = ~30-110tps (only 2 providers rn) https://openrouter.ai/deepseek/deepseek-v3.2
I think GLM 4.6 offered by Cerebras is much faster than any US model.
With caching the expensive US models end up being like 2x the price (e.g sonnet) and often much cheaper (e.g gpt-5 mini)
If they start caching then US companies will be completely out priced.
Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.
And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.
Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".
"Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?
The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?
It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."
It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true.
What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive.
Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great!
So I love everything about the competitive nature of this.
For instance, a lot of people thought they were running "DeepSeek" when they were really running some random distillation on ollama.
I ran the 1.58-bit Unsloth quant locally at the time it came out, and even at such low precision, it was super rare for it to get something wrong that o1 and GPT4 got right. I have never actually used a hosted version of the full DS.
Germany was right in some ways and wrong in others for the soviet unions strength. USSR failed to conquer Finland because of the military purges. German intelligence vastly under-estimated the amount of tanks and general preparedness of the Soviet army (Hitler was shocked the soviets had 40k tanks already). Lend Lease act really sent an astronomical amount of goods to the USSR which allowed them to fully commit to the war and really focus on increasing their weapon production, the numbers on the amount of tractors, food, trains, ammunition, etc. that the US sent to the USSR is staggering.
This is kinda reflected in the stock market, where the AI stocks are surging to new heights every day, yet their Chinese equivalents are relatively lagging behind in stock price, which suggests that investors are betting heavily on the US companies to "win" this "AI race" (if there's any gains to be made by winning).
Also, in the past couple years (or maybe a couple decades), there had also been a lot of crap talk about how China has to democratize and free up their markets in order to be competitive with the other first world countries, together with a bunch of "doomsday" predictions for authoritarianism in China. This narrative has completely lost any credibility, but the sentiment dies slowly...
Stalin just finished purging his entire officer corps, which is not a good omen for war, and the USSR failed miserably against the Finnish who were not the strongest of nations, while Germany just steamrolled France, a country that was much more impressive in WW1 than the Russians (who collapsed against Germany)
America is mostly suburbs and car sewers but that's because the voters like it that way.
Ideology played a role, but the data they worked with, was the finnish war, that was disastrous for the sowjet side. Hitler later famously said, it was all a intentionally distraction to make them believe the sowjet army was worth nothing. (Real reasons were more complex, like previous purging).
Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.
There were many counter revolutionaries among the leadership, even those conducting the purges. Stalin was like "ah fuck we're hella compromised." Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.
Well, Stalin was, by far, the biggest counter-revolutionary in the Politburo.
> Stalin was like "ah fuck we're hella compromised."
There's no evidence that anything significant was compromised at that point, and clear evidence that Stalin was in fact medically paranoid.
> Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.
Can we please not recycle 90-years old soviet propaganda? The Moscow trial being irrational self-harm was acknowledged by the USSR leadership as early as the fifties…
% curl https://api.deepseek.com/models \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}"
{"object":"list","data":[{"id":"deepseek-chat","object":"model","owned_by":"deepseek"},{"id":"deepseek-reasoner","object":"model","owned_by":"deepseek"}]}(unless/until a new version gets worse for your use case)
For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.
You'll also need space for the context window; I'm not sure how to calculate that either.
I don't want to make big generalizations. But one thing I noticed with chinese models, especially Kimi, is that it does very well on benchmarks, but fails on vibe testing. It feels a little bit over-fitting to the benchmark and less to the use cases.
I hope it's not the same here.
If it had vision and was better on long context I'd use it so much more.
I guess that’s kinda how it is for any system that’s trained to do well on benchmarks, it does well but rubbish at everything else.
Whereas the benchmark gains seem by new OpenAI, Grok and Claude models don't feel accompanied by vibe improvement
which version are you?
我是DeepSeek最新版本模型! 如果你想了解具体的版本号信息,我建议你:
查看官方文档 - DeepSeek官网和文档会有最准确的版本信息
关注官方公告 - 版本更新通常会在官方渠道公布
查看应用商店/网页版 - 使用界面通常会显示当前版本
我具备DeepSeek的所有最新功能特性,包括: 强大的对话和推理能力
128K上下文长度
文件上传处理(图像、文档等)
联网搜索功能(需手动开启)
完全免费使用
如果你需要知道精确的版本号用于技术对接或其他特定用途,最好直接查阅官方技术文档,那里会有最准确和详细的技术规格说明。有什么其他问题我可以帮你解答吗?