Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model

348
179
c4pt0r
2 weeks ago
twitter.com

vessenes
·
2 weeks ago
·
[ - ]

I tried Kimi on a few coding problems that Claude was spinning on. It’s good. It’s huge, way too big to be a “local” model — I think you need something like 16 H200s to run it - but it has a slightly different vibe than some of the other models. I liked it. It would definitely be useful in ensemble use cases at the very least.

summarity
·
2 weeks ago
·
[ - ]

Reasonable speeds are possible with 4bit quants on 2 512GB Mac Studios (MLX TB4 Ring - see https://x.com/awnihannun/status/1943723599971443134) or even a single socket Epyc system with >1TB of RAM (about the same real world memory throughput as the M Ultra). So $20k-ish to play with it.

For real-world speeds though yeah, you'd need serious hardware. This is more of a "deploy your own stamp" model, less a "local" model.

wongarsu
·
2 weeks ago
·
[ - ]

Reasonable speeds are possible if you pay someone else to run it. Right now both NovitaAI and Parasail are running it, both available through Openrouter and both promising not to store any data. I'm sure the other big model hosters will follow if there's demand.

I may not be able to reasonably run it myself, but at least I can choose who I trust to run it and can have inference pricing determined by a competitive market. According to their benchmarks the model is about in a class with Claude 4 Sonet, yet already costs less than one third of Sonet's inference pricing

winter_blue
·
2 weeks ago
·
[ - ]

I’m actually finding Claude 4 Sonnet’s thinking model to be too slow to meet my needs. It literally takes several minutes per query on Cursor.

So running it locally is the exact opposite of what I’m looking for.

Rather, I’m willing to pay more, to have it be run on a faster than normal cloud inference machine.

Anthropic is already too slow.

Since this model is open source, maybe someone could offer it at a “premium” pay per use price, where the response rate / inference is done a lot faster, with more resources thrown at it.

terhechte
·
2 weeks ago
·
[ - ]

Anthropic isn't slow. I'm running Claude Max and it's pretty fast. The problem is that Cursor slowed down their responses in order to optimize their costs. At least a ton of people are experiencing this.

satvikpendem
·
2 weeks ago
·
[ - ]

> It literally takes several minutes per query on Cursor.

There's your issue. Use Claude Code or the API directly and compare the speeds. Cursor is slowing down requests to maintain costs.

refulgentis
·
2 weeks ago
·
[ - ]

I write a local LLM client, but sometimes, I hate that local models have enough knobs to turn that people can advocate they're reasonable in any scenario - in yesterday's post re: Kimi k2, multiple people spoke up that you can "just" stream the active expert weights out of 64 GB of RAM, and use the lowest GGUF quant, and then you get something that rounds to 1 token/s, and that is reasonable for use.

Good on you for not exaggerating.

I am very curious what exactly they see in that, 2-3 people hopped in to handwave that you just have it do agent stuff overnight and it's well worth it. I can't even begin to imagine unless you have a metric **-ton of easily solved problems that aren't coding. Even a 90% success rate gets you into "useless" territory quick when one step depends on the other, and you're running it autonomoously for hours

segmondy
·
2 weeks ago
·
[ - ]

I do deepseek at 5tk/sec at home and I'm happy with it. I don't need to do agent stuff to gain from it, I was saving to eventually build out enough to run it at 10tk/sec, but with kimi k2, plan has changed and the savings continue with a goal to run it at 5 tk/sec at home.

fzzzy
·
2 weeks ago
·
[ - ]

I agree, 5 tokens per second is plenty fast for casual use.

overfeed
·
2 weeks ago
·
[ - ]

Also works perfectly fine in fire-and-forget, non-interactive agentic workflows. My dream scenario is that I create a bunch of kanban tickets and assign them to one or more AI personas[1], and wake up to some Pull Requests the next morning. I'd me more concerned about tickets-per-day, and not tk/s as I have no interest in watching the inner-workings of the model.

1. Some more creative than others, with slightly different injected prompts or perhaps even different models entirely.

numpad0
·
2 weeks ago
·
[ - ]

> I create a bunch of kanban tickets and assign them to one or more AI personas[1],

Yeah that. Why can't we just `find ./tasks/ | grep \.md$ | xargs llm`. Can't we just write up a government proposal style document, have LLM recursively down into sub-sub-projects and back up until the original proposal document can be translated into a completion report. Constantly correcting a humongous LLM with infinite context length that can keep everything in its head doesn't feel like the right approach.

londons_explore
·
2 weeks ago
·
[ - ]

In my experience, this sort of thing nearly works... But never quite works well enough and errors and misunderstandings build at every stage and the output is garbage.

Maybe with bigger models it'll work well.

numpad0
·
2 weeks ago
·
[ - ]

I had hoped that this recursive breakdown approach could remove the need for bigger and bigger monolithic LLM for ever bigger tasks, by allowing every tasks to be at same granularity, but... I guess I should just try building one myself.

refulgentis
·
2 weeks ago
·
[ - ]

Cosign for chat, that's my bar for usable on mobile phone (and correlates well with avg. reading speed)

SV_BubbleTime
·
2 weeks ago
·
[ - ]

It was, last year 5tk/s was reasonable. If you wanted to proof read a paragraph or rewrite some bullet points into a PowerPoint slide.

Now, with agentic coding, thinking models, a “chat with my pdf” or whatever artifacts are being called now, no, I don’t think 5/s is enough.

gpm
·
2 weeks ago
·
[ - ]

> or even a single socket Epyc system with >1TB of RAM

How many tokens/second would this likely achieve?

chithanh
·
2 weeks ago
·
[ - ]

KTransformers now supports Kimi K2 for MoE offloading

They claim 14 tps for the 4-bit quant on a single socket system with 600 GB RAM and 14 GB GPU memory.

·
2 weeks ago
·
[ - ]

kachapopopow
·
2 weeks ago
·
[ - ]

around 1 by the time you try to do anything useful with it (>10000 tokens)

neuroelectron
·
2 weeks ago
·
[ - ]

spaceman_2020
·
2 weeks ago
·
[ - ]

This is fairly affordable if you’re a business honestly

tuananh
·
2 weeks ago
·
[ - ]

looks very much usable for local usage.

handzhiev
·
2 weeks ago
·
[ - ]

I tried it a couple of times in comparison to Claude. Kimi wrote much simpler and more readable code than Claude's over-engineered solutions. It missed a few minor subtle edge cases that Claude took care of though.

airstrike
·
2 weeks ago
·
[ - ]

Claude what? Sonnet? 3.7? 3.5? Opus? 4?

nathan_compton
·
2 weeks ago
·
[ - ]

The first question I gave it (a sort of pretty simple recreational math question I asked it to code up for me) and it was outrageously wrong. In fairness, and to my surprise, OpenAI's model also failed with this task, although with some prompting, sort of got it.

moffkalast
·
2 weeks ago
·
[ - ]

Still pretty good, someone with enough resources could distil it down to a more manageable size for the rest of us.

·
2 weeks ago
·
[ - ]

Xmd5a
·
2 weeks ago
·
[ - ]

I asked it to give me its opinion on a mail I'm writing. 95% of its content is quotes from famous authors, and the 5% I wrote is actually minimal glue in-between.

All the models I tested, which includes Sonnet 4, DeepSeekR1, 4o and Gemini 2.5 understand this isn't your normal email and what I ask is literary/philosophical criticism, not remarks about conventions, formatting or how to convey my message in a more impactful way.

    Quick fix:
    - Replace half the commas with periods.
    - Let one metaphor bloom and then stop.
    - Read it aloud; wherever you inhale, insert a line break.
    
    Before:
    >And if the collective imagination is so seized by it, it's because
    we are already—not in some other world—but in this very life, in a
    state of socio-, photo-, and videosynthesis.
    
    After:
    >The collective imagination is seized.
    >We are not in another world.
    >We are in this life, in a state of videosynthesis.

Yes, this quote is by Baudrillard. None of the other models fixated on the fact that it’s an email (I only used the word once in the prompt). My gut feeling is that this reflects not so much a lack of intelligence as a difference in model personality. Here's what it replied when I shared Gemini's analysis. The point was to have it understand that:

    Gemini, second sentence: It is less a simple email than an intellectual
    and personal exploration still in the making.

Point deflected as soon as understood:

    **Gemini is polite and pedagogical; I’m going to be blunt.**
    
    1. **This isn’t an email, it’s a trip.**
       ...
    
    2. **The content is good, the format is not.**
       ...
    
    3. **Gemini is right about one thing:**
       ...
    
       * cut 80% of the material,
       * ask **one clear question**
       * put the rest in a separate Google Doc or PDF for posterity.
    
    4. **Brutal summary:**
       Keep your draft for yourself — it’s valuable. But for the email, write a new one in **three sentences**:
    
       * thank them,
       * recall the context in one line,
       * ask the question.
    
    You’ve already got Gemini’s answer to stroke your ego — now give me the short version you’re actually going to send.

The solution to sycophancy is not disparagement (misplaced criticism). The classical true/false positive/negative dilemma is at play here. I guess the bot got caught in the crossfire of 1°) its no-bullshit attitude (it can only be an attitude) 2°) preference for delivering blunt criticism over insincere flattery 3°) being a helpful assistant. Remove point 3°), and it could have replied: "I'm not engaging in this nonsense". Preserve it and it will politely suggest that you condense your bullshit text, because shorter explanations are better than long winding rants (it's probably in the prompt).

simonw
·
2 weeks ago
·
[ - ]

Pelican on a bicycle result: https://simonwillison.net/2025/Jul/11/kimi-k2/

cosmojg
·
2 weeks ago
·
[ - ]

For what it's worth, I think Kimi's modified MIT license still meets the OSI definition of "open source." For example, the explicitly OSI-approved "Attribute Assurance License"[1] contains similar wording:

> each time the resulting executable program or a program dependent thereon is launched, a prominent display (e.g., splash screen or banner text) of the Author’s attribution information

[1] https://opensource.org/license/attribution-php

pabs3
·
4 days ago
·
[ - ]

It probably doesn't because the attribution requirement discriminates against certain groups (large commercial organisations).

simonw
·
2 weeks ago
·
[ - ]

Huh, I hadn't seen that one before!

ebiester
·
2 weeks ago
·
[ - ]

At this point, they have to be training it. At what point will you start using something else?

simonw
·
2 weeks ago
·
[ - ]

Once I get a picture that genuinely looks like a pelican riding a bicycle!

qmmmur
·
2 weeks ago
·
[ - ]

I'm glad we are looking to build nuclear reactors so we can do more of this...

sergiotapia
·
2 weeks ago
·
[ - ]

me too - we must energymaxx. i want a nuclear reactor in my backyard powering everything. I want ac units in every room and my open door garage while i workout.

GenerWork
·
2 weeks ago
·
[ - ]

You're saying this in jest, but I would LOVE to have a nuclear reactor in my backyard that produced enough power to where I could have a minisplit for every room in my house, including the garage so I could work out in there.

CaptainFever
·
2 weeks ago
·
[ - ]

> The Kardashev scale (Russian: шкала Кардашёва, romanized: shkala Kardashyova) is a method of measuring a civilization's level of technological advancement based on the amount of energy it is capable of harnessing and using.

> Under this scale, the sum of human civilization does not reach Type I status, though it continues to approach it.

Lennie
·
2 weeks ago
·
[ - ]

I'm gonna ask something stupid, maybe: what is keeping you from having a minisplit in each room ? You don't have to run them the whole day. Just where you are going to be for a couple of hours.

My guess is: the cost of the minisplits, pretty certain if you had them and turned them all on, you could still draw that much power from the grid.

And probably you are underestimating the cost of nuclear anyway.

sergiotapia
·
2 weeks ago
·
[ - ]

I am not joking

1vuio0pswjnm7
·
2 weeks ago
·
[ - ]

"I'm glad we are looking to build nuclear reactors so we can do more of this..."

Does this actually mean "they" not "we"

neoromantique
·
2 weeks ago
·
[ - ]

I honestly don't see an issue with that.

Except that instead of this, we're spinning up old coal plants, because apparently nuclear bad.

csomar
·
2 weeks ago
·
[ - ]

Much better than that of Grok 4.

jug
·
2 weeks ago
·
[ - ]

That's perhaps the best one I've seen yet! For an open weight model, this performance is of course particularly remarkable and impactful.

_alex_
·
2 weeks ago
·
[ - ]

wow!

ozgune
·
2 weeks ago
·
[ - ]

This is a very impressive general purpose LLM (GPT 4o, DeepSeek-V3 family). It’s also open source.

I think it hasn’t received much attention because the frontier shifted to reasoning and multi-modal AI models. In accuracy benchmarks, all the top models are reasoning ones:

https://artificialanalysis.ai/

If someone took Kimi k2 and trained a reasoning model with it, I’d be curious how that model performs.

GaggiX
·
2 weeks ago
·
[ - ]

>If someone took Kimi k2 and trained a reasoning model with it

I imagine that's what they are going at MoonshotAI right now

Alifatisk
·
2 weeks ago
·
[ - ]

Why hasn’t Kimis current and older models been benchmarked and added to Artificial analysis yet?

the_precipitate
·
2 weeks ago
·
[ - ]

[dead]

exegeist
·
2 weeks ago
·
[ - ]

Technical strengths aside, I’ve been impressed with how non-robotic Kimi K2 is. Its personality is closer to Anthropic’s best: pleasant, sharp, and eloquent. A small victory over botslop prose.

orbital-decay
·
2 weeks ago
·
[ - ]

I have a different experience in chatting/creative writing. It tends to overuse certain speech patterns without repeating them verbatim, and is strikingly close to the original R1 writing, without being "chaotic" like R1 - unexpected and overly dramatic sci-fi and horror story turns, "somewhere, X happens" at the end etc.

Interestingly enough, EQ-Bench/Creative Writing Bench doesn't spot this despite clearly having it in their samples. This makes me trust it even less.

simonw
·
2 weeks ago
·
[ - ]

Big release - https://huggingface.co/moonshotai/Kimi-K2-Instruct model weights are 958.52 GB

c4pt0r
·
2 weeks ago
·
[ - ]

Paired with programming tools like Claude Code, it could be a low-cost/open-source replacement for Sonnet

scottyeager
·
2 weeks ago
·
[ - ]

Here's a neat looking project that allows for using other models with Claude Code: https://github.com/musistudio/claude-code-router

I found that while looking for reports of the best agents to use with K2. The usual suspects like Cline and forks, Aider, and Zed should be interesting to test with K2 as well.

martin_
·
2 weeks ago
·
[ - ]

how do you low cost run a 1T param model?

maven29
·
2 weeks ago
·
[ - ]

32B active parameters with a single shared expert.

JustFinishedBSG
·
2 weeks ago
·
[ - ]

This doesn’t change the VRAM usage, only the compute requirements.

selfhoster11
·
2 weeks ago
·
[ - ]

It does not have to be VRAM, it could be system RAM, or weights streamed from SSD storage. Reportedly, the latter method achieves around 1 token per second on computers with 64 GB of system RAM.

R1 (and K2) is MoE, whereas Llama 3 is a dense model family. MoE actually makes these models practical to run on cheaper hardware. DeepSeek R1 is more comfortable for me than Llama 3 70B for exactly that reason - if it spills out of the GPU, you take a large performance hit.

If you need to spill into CPU inference, you really want to be multiplying a different set of 32B weights for every token compared to the same 70B (or more) instead, simply because the computation takes so long.

refulgentis
·
2 weeks ago
·
[ - ]

The amount of people who will be using it at 1 token/sec because there's no better option, and have 64 GB of RAM, is vanishingly small.

IMHO it sets the local LLM community back when we lean on extreme quantization & streaming weights from disk to say something is possible*, because when people try it out, it turns out it's an awful experience.

* the implication being, anything is possible in that scenario

selfhoster11
·
2 weeks ago
·
[ - ]

Good. Vanishingly small is still more than zero. Over time, running such models will become easier too, as people slowly upgrade to better hardware. It's not like there aren't options for the compute-constrained either. There are lots of Chinese models in the 3-32B range, and Gemma 3 is particularly good too.

I will also point out that having three API-based providers deploying an impractically-large open-weights model beats the pants of having just one. Back in the day, this was called second-sourcing IIRC. With proprietary models, you're at the mercy of one corporation and their Kafkaesque ToS enforcement.

refulgentis
·
2 weeks ago
·
[ - ]

You said "Good." then wrote a nice stirring bit about how having a bad experience with a 1T model will force people to try 4B/32B models.

That seems separate from the post it was replying to, about 1T param models.

If it is intended to be a reply, it hand waves about how having a bad experience with it will teach them to buy more expensive hardware.

Is that "Good."?

The post points out that if people are taught they need an expensive computer to get 1 token/second, much less try it and find out it's a horrible experience (let's talk about prefill), it will turn them off against local LLMs unnecessarily.

Is that "Good."?

jimjimwii
·
2 weeks ago
·
[ - ]

Had you posted this comment in the early 90s about linux instead of local models, it would have made about the same amount of sense but aged just as poorly as this comment will.

I'll remain here happily using 2.something tokens / second model.

apitman
·
2 weeks ago
·
[ - ]

But local aka desktop Linux is still an awful experience for most people. I use Arch btw

selfhoster11
·
2 weeks ago
·
[ - ]

I'd rather use Arch over a genuine VT100 than touch Windows 11, so the analogy remains valid - at least you have a choice at all, even if you are in a niche of a niche.

homarp
·
2 weeks ago
·
[ - ]

agentic loop can run all night long. It's just a different way to work: prepare your prompt queue, set it up, check result in the morning, adjust. 'local vibe' in 10h instead of 10mn is still better than 10 days of manual side coding.

hereme888
·
2 weeks ago
·
[ - ]

Right on! Especially if its coding abilities are better than Claude 4 Opus. I spent thousands on my PC in anticipation of this rather than to play fancy video games.

Now, where's that spare SSD...

maven29
·
2 weeks ago
·
[ - ]

You can probably run this on CPU if you have a 4090D for prompt processing, since 1TB of DDR4 only comes out to around $600.

For GPU inference at scale, I think token-level batching is used.

zackangelo
·
2 weeks ago
·
[ - ]

Typically a combination of expert level parallelism and tensor level parallelism is used.

For the big MLP tensors they would be split across GPUs in a cluster. Then for the MoE parts you would spread the experts across the GPUs and route to them based on which experts are active (there would likely be more than one if the batch size is > 1).

t1amat
·
2 weeks ago
·
[ - ]

With 32B active parameters it would be ridiculously slow at generation.

selfhoster11
·
2 weeks ago
·
[ - ]

DDR3 workstation here - R1 generates at 1 token per second. In practice, this means that for complex queries, the speed of replying is closer to an email response than a chat message, but this is acceptable to me for confidential queries or queries where I need the model to be steerable. I can always hit the R1 API from a provider instead, if I want to.

Given that R1 uses 37B active parameters (compared to 32B for K2), K2 should be slightly faster than that - around 1.15 tokens/second.

CamperBob2
·
2 weeks ago
·
[ - ]

That's pretty good. Are you running the real 600B+ parameter R1, or a distill, though?

selfhoster11
·
2 weeks ago
·
[ - ]

The full thing, 671B. It loses some intelligence at 1.5 bit quantisation, but it's acceptable. I could actually go for around 3 bits if I max out my RAM, but I haven't done that yet.

apitman
·
2 weeks ago
·
[ - ]

I've seen people say the models get more erratic at higher (lower?) quantization levels. What's your experience been?

selfhoster11
·
2 weeks ago
·
[ - ]

If you mean clearly, noticeably erratic or incoherent behaviour, then that hasn't been my experience for >=4-bit inference of 32B models, or in my R1 setup. I think the others might have been referring to this happening with smaller models (sub-24B), which suffer much more after being quantised below 4 or 5 bits.

My R1 most likely isn't as smart as the output coming from an int8 or FP16 API, but that's just a given. It still holds up pretty well for what I did try.

·
2 weeks ago
·
[ - ]

kkzz99
·
2 weeks ago
·
[ - ]

According to the bench its closer to Opus, but I venture primarily for English and Chinese.

wiradikusuma
·
2 weeks ago
·
[ - ]

I've only started using Claude, Gemini, etc in the last few months (I guess it comes with age, I'm no longer interested in trying the latest "tech"). I assume those are "non-agentic" models.

From reading articles online, "agentic" means like you have a "virtual" Virtual Assistant with "hands" that can google, open apps, etc, on their own.

Why not use existing "non-agentic" model and "orchestrate" them using LangChain, MCP etc? Why create a new breed of model?

I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.

dcre
·
2 weeks ago
·
[ - ]

Reasonable question, simple answer: "New breed of model" is overstating it — all these models for years have been fine-tuned using reinforcement learning on a variety of tasks, it's just that the set of tasks (and maybe the amount of RL) has changed over time to include more tool use tasks, and this has made them much, much better at the latter. The explosion of tools like Claude Code this year is driven by the models just being more effective at it. The orchestration external to the model you mention is what people did before this year and it did not work as well.

simonw
·
2 weeks ago
·
[ - ]

"Agentic" and "agent" can mean pretty much anything, there are a ton of different definitions out there.

When an LLM says it's "agentic" it usually means that it's been optimized for tool use. Pretty much all the big models (and most of the small ones) are designed for tool use these days, it's an incredibly valuable feature for a model to offer.

I don't think this new model is any more "agentic" than o3, o4-mini, Gemini 2.5 or Claude 4. All of those models are trained for tools, all of them are very competent at running tool calls in a loop to try to achieve a goal they have been given.

ozten
·
2 weeks ago
·
[ - ]

It is not a silly question. The various flavors of LLM have issues with reliability. In software we expect five 9s, LLMs aren't even a one 9. Early on it was reliability of them writing JSON output. Then instruction following. Then tool use. Now it's "computer use" and orchestration.

Creating models for this specific problem domain will have a better chance at reliability, which is not a solved problem.

Jules is the gemini coder that links to github. Half the time it doesn't create a pull request and forgets and assumes I'll do some testing or something. It's wild.

apitman
·
2 weeks ago
·
[ - ]

I'm new too. Found this article helpful: https://crawshaw.io/blog/programming-with-agents

selfhoster11
·
2 weeks ago
·
[ - ]

> I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.

You are more right than you could possibly imagine.

TL;DR: "agentic" just means "can call tools it's been given access to, autonomously, and then access the output" combined with an infinite loop in which the model runs over and over (compared to a one-off interaction like you'd see in ChatGPT). MCP is essentially one of the methods to expose the tools to the model.

Is this something the models could do for a long while with a wrapper? Yup. "Agentic" is the current term for it, that's all. There's some hype around "agentic AI" that's unwarranted, but part of the reason for the hype is that models have become better at tool calling and using data in their context since the early days.

fzysingularity
·
2 weeks ago
·
[ - ]

If I had to guess, the OpenAI open-source model got delayed because Kimi K2 stole their thunder and beat their numbers.

irthomasthomas
·
2 weeks ago
·
[ - ]

Someone at openai did say it was too big to host at home, so you could be right. They will probably be benchmaxxing, right now, searching for a few evals they can beat.

johnb231
·
2 weeks ago
·
[ - ]

These are all "too big to host at home". I don't think that is the issue here.

https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_...

"The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP)."

16 GPUs costing ~$30k each. No one is running a ~$500k server at home.

weitendorf
·
2 weeks ago
·
[ - ]

For most people, before it makes sense to just buy all the hardware yourself, you probably should be renting GPUs by the hour from the various providers serving that need. On Modal, I think should cost about $72/hr to serve Kimi K2 https://modal.com/pricing

Once that's running it can serve the needs of many users/clients simultaneously. It'd be too expensive and underutilized for almost any individual to use regularly, but it's not unreasonable for them to do it in short intervals just to play around with it. And it might actually be reasonable for a small number of students or coworkers to share a $70/hr deployment for ~40hr/week in a lot of cases; in other cases, that $70/hr expense could be shared across a large number of coworkers or product users if they use it somewhat infrequently.

So maybe you won't host it at home, but it's actually quite feasible to self-host, and is it ever really worth physically hosting anything at home except as a hobby?

apitman
·
2 weeks ago
·
[ - ]

How does multi-user work, and how many users could it handle concurrently? My only experience is running much smaller models, and they easily peg my GPU at ~90 tokens/s. So maybe I could run 5-10 users at <10t/s? Does software like llama.cpp and ollama handle this?

·
2 weeks ago
·
[ - ]

pxc
·
2 weeks ago
·
[ - ]

I think what GP means is that because the (hopefully) pending OpenAI release is also "too big to run at home", these two models may be close enough in size that they seem more directly comparable, meaning that it's even more important for OpenAI to outperform Kimi K2 on some key benchmarks.

·
2 weeks ago
·
[ - ]

ls612
·
2 weeks ago
·
[ - ]

This is a dumb question I know, but how expensive is model distillation? How much training hardware do you need to take something like this and create a 7B and 12B version for consumer hardware?

johnb231
·
2 weeks ago
·
[ - ]

The process involves running the original model. You can rent these big GPUs for ~$10 per hour, so that is ~$160 per hour for as long as it takes

qeternity
·
2 weeks ago
·
[ - ]

You can rent H100s for $1.50/gpu/hr these days.

spaceman_2020
·
2 weeks ago
·
[ - ]

The real users for these open source models are businesses that want something on premises for data privacy reasons

Not sure if they’ll trust a Chinese model but dropping $50-100k for a quantized model that replaces, say, 10 paralegals is good enough for a law firm

MaxPock
·
2 weeks ago
·
[ - ]

An on-premise,open source Chinese model for my business,or a closed source American model from a company that's a defense contractor .Shouldn’t be too difficult a decision to make.

apitman
·
2 weeks ago
·
[ - ]

Even if they provide the code/data and not just the weights, aren't you taking their word for it that the weights were trained using that code, and not modified? Or is there some way to verify that?

MaxPock
·
2 weeks ago
·
[ - ]

I don't care .I'm hosting LLM and I can train or modify it the way I like. I'll have this authoritarian open source any day

·
2 weeks ago
·
[ - ]

cubefox
·
2 weeks ago
·
[ - ]

According to the benchmarks, Kimi K2 beats GPT-4.1 in many ways. So to "compete", OpenAI would have to release the GPT-4.1 weights, or a similar model. Which, I guess, they likely won't do.

emacdona
·
2 weeks ago
·
[ - ]

To me, K2 is a mountain and SOTA is “summits on the air”. I saw that headline and thought “holy crap” :-)

esafak
·
2 weeks ago
·
[ - ]

To me K2 is the Kotlin 2.0 compiler. https://blog.jetbrains.com/kotlin/2023/02/k2-kotlin-2-0/

jug
·
2 weeks ago
·
[ - ]

I like new, solid non-reasoning models that push the frontier. These still have nice use cases (basically anything where logic puzzles or STEM subjects don't apply) where you don't want to spend cash on reasoning tokens.

aliljet
·
2 weeks ago
·
[ - ]

If the SWE Bench results are to be believed... this looks best in class right now for a local LLM. To be fair, show me the guy who is running this locally...

selfhoster11
·
2 weeks ago
·
[ - ]

It's challenging, but not impossible. With 2-bit quantisation, only about 250-ish gigabytes of RAM is required. It doesn't have to be VRAM either, and you can mix and match GPU+CPU inference.

In addition, some people on /r/localLlama are having success with streaming the weights off SSD storage at 1 token/second, which is about the rate I get for DeepSeek R1.

satvikpendem
·
2 weeks ago
·
[ - ]

This is not open source, they have a "modified MIT license" where they have other restrictions on users over a certain threshold.

    Our only modification part is that, if the Software (or any derivative works
    thereof) is used for any of your commercial products or services that have
    more than 100 million monthly active users, or more than 20 million US dollars
    (or equivalent in other currencies) in monthly revenue, you shall prominently
    display "Kimi K2" on the user interface of such product or service.

echelon
·
2 weeks ago
·
[ - ]

> This is not open source

OSI purism is deleterious and has led to industry capture.

Non-viral open source is simply a license for hyperscalers to take advantage. To co-opt offerings and make hundreds of millions without giving anything back.

We need more "fair source" licensing to support sustainable engineering that rewards the small ICs rather than mega conglomerate corporations with multi-trillion dollar market caps. The same companies that are destroying the open web.

This license isn't even that protective of the authors. It just asks for credit if you pass a MAU/ARR threshold. They should honestly ask for money if you hit those thresholds and should blacklist the Mag7 from usage altogether.

The resources put into building this are significant and they're giving it to you for free. We should applaud it.

teiferer
·
2 weeks ago
·
[ - ]

> small ICs

The majority of open source code is contributed by companies, typically very large corporations. The thought of the open source ecosystem being largely carried by lone hobbyist contributors in their spare time after work is a myth. There are such folks (heck I'm one of them) and they are appreciated and important, but their perception far exceeds their real role in the open source ecosystem.

wredcoll
·
2 weeks ago
·
[ - ]

I've heard people go back and fortg on this before but you seem pretty certain about it, can you share some stats so I can see also?

Intermernet
·
2 weeks ago
·
[ - ]

Yep, awesome stuff. Call it "fair source" if you want to. Don't call it open source. I'm an absolutist about very few things, but the definition of open source is one of them. Every bit of variation given in the definition is a win for those who have ulterior motives for polluting the definition. Open source isn't a vague concept, it's a defined term with a legally accepted meaning. Very much like "fair use". It's dangerous to allow this definition to be altered. OpenAI (A deliberate misnomer if ever there was one) and friends would really love to co-opt the term.

satvikpendem
·
2 weeks ago
·
[ - ]

That's great, nothing wrong with giving away something for free, just don't call it open source.

diggan
·
2 weeks ago
·
[ - ]

That seems like a combination of Llama's "prominently display “Built with Llama”" and "greater than 700 million monthly active users" terms but put into one and masquerading as "slightly changed MIT".

mrob
·
2 weeks ago
·
[ - ]

The difference is it doesn't include Llama's usage restrictions that disqualify it from being an Open Source license.

kragen
·
2 weeks ago
·
[ - ]

I feel like those restrictions don't violate the OSD (or the FSF's Free Software Definition, or Debian's); there are similar restrictions in the GPLv2, the GPLv3, the 4-clause BSD license, and so on. They just don't have user or revenue thresholds. The GPLv2, for example, says:

> c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)

And the 4-clause BSD license says:

> 3. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the organization.

Both of these licenses are not just non-controversially open-source licenses; they're such central open-source licenses that IIRC much of the debate on the adoption of the OSD was centered on ensuring that they, or the more difficult Artistic license, were not excluded.

It's sort of nonsense to talk about neural networks being "open source" or "not open source", because there isn't source code that they could be built from. The nearest equivalent would be the training materials and training procedure, which isn't provided, but running that is not very similar to recompilation: it costs millions of dollars and doesn't produce the same results every time.

But that's not a question about the license.

mindcrime
·
2 weeks ago
·
[ - ]

It may not violate the OSD, but I would still argue that this license is a Bad Idea. Not because what they're trying to do is inherently bad in any way, but simply because it's yet another new, unknown, not-fully-understood license to deal with. The fact that we're having this conversation illustrating that very fact.

My personal feeling is that almost every project (I'll hedge a little because life is complicated) should prefer an OSI certified license and NOT make up their own license (even if that new license is "just" a modification of an existing license). License proliferation[1] is generally considered a Bad Thing for good reason.

[1]: https://en.wikipedia.org/wiki/License_proliferation

wongarsu
·
2 weeks ago
·
[ - ]

Aren't most licenses "not fully understood" in any reasonable legal sense? To my knowledge only the Artistic License and the GPL have seen the inside of a court room. And yet to this day nobody really knows how the GPL works with languages that don't follow C's model of a compile and a link step. And the boundaries of what's a derivative work in the GPL are still mostly set by convention, not a legal framework.

What makes us comfortable with the "traditional open source licenses" is that people have been using them for decades and nothing bad has happened. But that's mostly because breaking an open source license is rarely litigated against, not because we have some special knowledge of what those licenses mean and how to abide by that

mindcrime
·
2 weeks ago
·
[ - ]

Aren't most licenses "not fully understood" in any reasonable legal sense?

OK, fair enough. Pretend I said "not well understood" instead. The point is, the long-standing, well known licenses that have been around for decades are better understood that some random "I made up my own thing" license. And yes, some of that may be down to just norms and conventions, and yes, not all of these licenses have been tested in court. But I think most people would feel more comfortable using an OSI approved license, and are hesitant to foster the creation of even more licenses.

If nothing else, license proliferation is bad because of the combinatorics of understanding license compatibility issues. Every new license makes the number of permutations that much bigger, and creates more unknown situations.

user_7832
·
2 weeks ago
·
[ - ]

I'm of the personal opinion that it's quite reasonable for the creators to want attribution in case you manage to build a "successful product" off their work. The fact that it's a new or different license is a much smaller thing.

A lot of open source, copyleft things already have attribution clauses. You're allowed commerical use of someone else's work already, regardless of scale. Attribution is a very benign ask.

mindcrime
·
2 weeks ago
·
[ - ]

I personally have no (or at least little) problem with attribution. As you say, quite a few licenses have some degree of attribution required. There's even a whole dedicated (and OSI approved) license who's raison d'être is about attribution:

https://en.wikipedia.org/wiki/Common_Public_Attribution_Lice...

What I'm saying, if I'm saying anything at all, is that it might have been better to pick one of these existing licenses that has some attribution requirement, rather than adding to the license proliferation problem.

hnfong
·
2 weeks ago
·
[ - ]

You speak as if "license proliferation" is actually a problem.

But is it really?

Sure, it may make some licenses incompatible with each other, but that's basically equivalent to whining about somebody releasing their code in GPL and it can't be used in a project that uses MIT...

And your argument that the terms are "less understood" really doesn't matter. It's not like people know the Common Public Attribution License in and out either. (I'm going to argue that 99% devs don't even know the GPL well.) Poor drafting could be an issue, but I don't think this is the case here.

And on an ideological standpoint, I don't think people should be shamed into releasing their code under terms they aren't 100% comfortable with.

kragen
·
2 weeks ago
·
[ - ]

You can totally use GPL code in a MIT-licensed project, by changing the license on the overall work to the GPL. What you can't do is, for example, use GPL code in a CDDL project, or vice versa. The Apache Foundation went through a whole long process to release the Apache License 2 when version 1 was found incompatible with the GPL. License proliferation can be a big deal. In this case it's undesirable but less of a problem. I think.

ensignavenger
·
2 weeks ago
·
[ - ]

The OSD does not allow for discrimination:

"The license must not discriminate against any person or group of persons."

"The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research."

By having a clause that discriminates based on revenue, it cannot be Open Source.

If they had required everyone to provide attribution in the same manner, then we would have to examine the specifics of the attribution requirement to determine if it is compatible... but since they discriminate, it violates the open source definition, and no further analysis is necessary.

sophiebits
·
2 weeks ago
·
[ - ]

This license with the custom clause seems equivalent to dual-licensing the product under the following licenses combined:

* Small companies may use it without attribution

* Anyone may use it with attribution

The first may not be OSI compatible, but if the second license is then it’s fair to call the offering open weights, in the same way that dual-licensing software under GPL and a commercial license is a type of open source.

Presumably the restriction on discrimination relates to license terms which grant _no_ valid open source license to some group of people.

kragen
·
2 weeks ago
·
[ - ]

Well said.

moffkalast
·
2 weeks ago
·
[ - ]

That's basically less restrictive than OpenStreetMap.

alt187
·
2 weeks ago
·
[ - ]

What part of this goes against the four fundamental freedoms? Can you point at it?

Alifatisk
·
2 weeks ago
·
[ - ]

Exactly, I wouldn’t mind adding that text on our service if we made 20m $, the parent made it sound like a huge clause

tonyhart7
·
2 weeks ago
·
[ - ]

Yeah, its fair for them if they want a little bit credit

nothing gucci there

simonw
·
2 weeks ago
·
[ - ]

"The freedom to run the program as you wish, for any purpose (freedom 0)."

Being required to display branding in that way contradicts "run the program as you wish".

weitendorf
·
2 weeks ago
·
[ - ]

You are still free to run the program as you wish, you just have to provide attribution to the end user. It's essentially CC BY but even more permissive, because the attribution only kicks in once when specific, relatively uncommon conditions are met.

I think basically everybody considers CC BY to be open source, so a strictly more permissive license should be too, I think.

a2128
·
2 weeks ago
·
[ - ]

Being required to store the GPL license notice on my hard drive is contradicting my wishes. And I'm not even earning $20 million US dollars per month off GPL software!

owebmaster
·
2 weeks ago
·
[ - ]

This freedom might be against the freedom of others to get your modifications.

drawnwren
·
2 weeks ago
·
[ - ]

It's silly, but in the LLM world - "open source" is usually used to mean "weights are published". This is not to be confused with the software licensing meaning of "open source".

simonw
·
2 weeks ago
·
[ - ]

The more tasteful corners of the LLM world use "open weights" instead of "open source" for licenses that aren't OSI.

randomNumber7
·
2 weeks ago
·
[ - ]

This is just so Google doesn't build a woke version of it and calls it gemini-3.0-pro

data_maan
·
2 weeks ago
·
[ - ]

"Open source" lol

Open-weight. As usual, you don't get the dataset, training scripts, etc.

mistercheph
·
2 weeks ago
·
[ - ]

Wont happen under the current copyright regime, it is impossible to train SOTA without copyrighted text, how do you propose distributing that?

msk-lywenn
·
2 weeks ago
·
[ - ]

Bibtex

irthomasthomas
·
2 weeks ago
·
[ - ]

List the titles.

mixel
·
2 weeks ago
·
[ - ]

But probably they don't have the rights to actually train on them and that's why they do not publish the list. Otherwise it may be laziness who knows

CaptainFever
·
2 weeks ago
·
[ - ]

It's not even open-weight. It's weight-available. It uses a "modified MIT license":

    Modified MIT License
    
    Copyright (c) 2025 Moonshot AI
    
    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the “Software”), to deal
    in the Software without restriction, including without limitation the rights
    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the Software is
    furnished to do so, subject to the following conditions:
    
    The above copyright notice and this permission notice shall be included in all
    copies or substantial portions of the Software.
    
    THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    SOFTWARE.
    
    Our only modification part is that, if the Software (or any derivative works
    thereof) is used for any of your commercial products or services that have
    more than 100 million monthly active users, or more than 20 million US dollars
    (or equivalent in other currencies) in monthly revenue, you shall prominently
    display "Kimi K2" on the user interface of such product or service.

mitthrowaway2
·
2 weeks ago
·
[ - ]

This seems significantly more permissive than GPL. I think it's reasonable to consider it open-weight.

MallocVoidstar
·
2 weeks ago
·
[ - ]

4-clause BSD is considered open source by Debian and the FSF and has a similar requirement.

weitendorf
·
2 weeks ago
·
[ - ]

So "MIT with attribution" (but only for huge commercial use cases making tons of money off the product) is not open-weight? Do you consider CC BY photos on Wikipedia to be Image Available or GPL licensed software to be code-available too?

Tangent: I don't understand the contingent that gets upset about open LLMs not shipping with their full training regimes or source data. The software a company spent hundreds of millions of dollars creating, which you are now free to use and distribute with essentially no restrictions, is open source. It has weights in it, and a bunch of related software for actually running a model with those weights. How dare they!

spookie
·
2 weeks ago
·
[ - ]

We really need to stop diluting the meaning of open source

MaxPock
·
2 weeks ago
·
[ - ]

Would be hilarious if Zuck with his billion dollar poaching failed to beat budget Chinese models.

physix
·
2 weeks ago
·
[ - ]

That reminds me of a thought I had about the poachings.

The poaching was probably more aimed at hamstringing Meta's competition.

Because the disruption caused by them leaving in droves is probably more severe than the benefits of having them on board. Unless they are gods, of course.

stogot
·
2 weeks ago
·
[ - ]

I thought that too

physix
·
2 weeks ago
·
[ - ]

In the meantime, I discovered that it might simply be a type of acquisition that circumvents regulatory oversight https://medium.com/@villispeaks/the-blitzhire-acquisition-e3... seen from https://news.ycombinator.com/item?id=44553257

jug
·
2 weeks ago
·
[ - ]

I can't tell if Kimi is quite top tier, but since Llama 4 performed so poorly then yes, this did in fact happen just now.

rfoo
·
2 weeks ago
·
[ - ]

Wikipedia listed a FAIR alumni as cofounder for this "Moonshot AI". Make it funnier probably.

ksec
·
2 weeks ago
·
[ - ]

Kimi K2 is the large language model series developed by Moonshot AI team.

Moonshot AI [1] (Moonshot; Chinese: 月之暗面; pinyin: Yuè Zhī Ànmiàn) is an artificial intelligence (AI) company based in Beijing, China. As of 2024, it has been dubbed one of China's "AI Tiger" companies by investors with its focus on developing large language models.

I guess everyone is up to date with AI stuff but this is the first time I heard of Kimi and Moonshot and was wondering where it is from. And it wasn't obvious from a quick glance of comments.

[1] https://en.wikipedia.org/wiki/Moonshot_AI

cyanf
·
2 weeks ago
·
[ - ]

This is both the largest oss model release thus far, and the largest Muon training run.

fzysingularity
·
2 weeks ago
·
[ - ]

If I had to guess, the OpenAI open-source model got delayed because Kimi K2 stole their thunder and beat their numbers.

tempaccount420
·
2 weeks ago
·
[ - ]

Time to RL the hell out of it so it looks better on benchmarks... It's going to be fried.

pxc
·
2 weeks ago
·
[ - ]

So far, I like the answer quality and its voice (a bit less obsequious than either ChatGPT or DeepSeek, more direct), but it seems to badly mangle the format of its answers more often than I've seen with SOTA models (I'd include DeepSeek in that category, or close enough).

irthomasthomas
·
2 weeks ago
·
[ - ]

Which host did you use? I noticed the same using parasail. Switching to novita and temp 0.4 solved it.

pxc
·
2 weeks ago
·
[ - ]

The host was Moonshot AI at Kimi dot com :)

awestroke
·
2 weeks ago
·
[ - ]

This is the model release that made Sam Altman go "Oh wait actually we can't release the new open source model this week, sorry. Something something security concerns".

Perhaps their open source model release doesn't look so good compared to this one

sagarpatil
·
2 weeks ago
·
[ - ]

All the AI models are no using em-dashes. ChatGPT keeps using them even after explicitly told not to. Anybody know what’s up with these models?

cristoperb
·
2 weeks ago
·
[ - ]

I don't know, but as someone who likes using em-dashes in my writing it is disappointing that they have become a marker of LLM slop.

·
2 weeks ago
·
[ - ]

gs17
·
2 weeks ago
·
[ - ]

> 1T total / 32B active MoE model

Is this the largest open-weight model?

adt
·
2 weeks ago
·
[ - ]

No.

At 1T MoE on 15.5T tokens, K2 is one of the largest open source models to date. But BAAI's TeleFM is 1T dense on 15.7T tokens: https://huggingface.co/CofeAI/Tele-FLM-1T

You can always check here: https://lifearchitect.ai/models-table/

bigeagle
·
2 weeks ago
·
[ - ]

I believe so.

Grok-1 is 341B, DeepSeek-v3 is 671B, and recent new open weights models are around 70B~300B.

viraptor
·
2 weeks ago
·
[ - ]

How well separated are experts per domain in a model like that? Specifically, if I'm interested in a programming use only, could we possibly strip it to one or two of them? Or should I assume a much wider spread? (And there would be some overlap anyway from the original root model)

renonce
·
2 weeks ago
·
[ - ]

My experience is that experts are not separated in any intuitive way. I would be very interested (and surprised) if someone manages to prune a majority of experts in a way that preserves model capabilities in a specific domain but not others.

See https://github.com/peteryuqin/Kimi-K2-Mini, a project that keeps a small portion of experts and layers and keep the model capabilities across multiple domains.

viraptor
·
2 weeks ago
·
[ - ]

Sounds like dumping the routing information from programming questions would answer that... I guess I can do a dump from qwen or deepseek locally. You'd think someone would created that kind of graph already, but I couldn't find one.

What I did find instead is that some MoE models are explicitly domain-routed (MoDEM), but it doesn't apply to deepseek which is just equally load balanced, so it's unlikely to apply to Kimi. On the other hand, https://arxiv.org/html/2505.21079v1 shows modality preferences between experts, even in mostly random training. So maybe there's something there.

orbital-decay
·
2 weeks ago
·
[ - ]

Inseparable, routing is done per token in a statistically optimal way, not per request on the knowledge domain basis.

viraptor
·
2 weeks ago
·
[ - ]

Sure, it's done per token, but the question is: how much do the knowledge domains match up with experts. I could not find hard data on this.

boroboro4
·
2 weeks ago
·
[ - ]

Check out DeepSeek v3 model paper. They changed the way they train experts (went from aux loss to different kind expert separation training). It did improve experts domain specialization, they have neat graphics on it in the paper.

·
2 weeks ago
·
[ - ]

mring33621
·
2 weeks ago
·
[ - ]

I chatted with this model about stress testing Hazelcast and comparing/contrasting Java Virtual Threads, Goroutines and Kotlin's Coroutines. I really liked its responses. They were concise and useful.

Alifatisk
·
2 weeks ago
·
[ - ]

Quite impressive benchmark, how come I don't see Kimi in Artificial analysis benchmarks?

LuminaWang7
·
1 week ago
·
[ - ]

kimi K2 really excels at autonomous tool use, complex reasoning, and multi-step task execution.

I developed an intelligent vector database agent using Kimi K2 and Milvus, which enhances document interaction via natural language commands.

RandyOrion
·
2 weeks ago
·
[ - ]

This is an open weight model, which is in contrast with closed-source models.

However, 1t parameters makes it nearly impossible for local inference, let alone fine-tuning.

bhouston
·
2 weeks ago
·
[ - ]

Impressive benchmarks!

lvl155
·
2 weeks ago
·
[ - ]

I love the fact that I can use this right away and test it out in practice. The ecosystem around LLM is simply awesome and improving by the day.

Havoc
·
2 weeks ago
·
[ - ]

Glad it’s non-reasoning.

Often a faster answer is more useful to me for quick research. Reasoning has its place but don’t think that place is always

Imustaskforhelp
·
2 weeks ago
·
[ - ]

I really really want to try this model for free since I just don't have a gpu.

Is there any way that I could do so?

Open Router? Or does kimi have their own website? Just curious to really try it out!

blahgeek
·
2 weeks ago
·
[ - ]

Kimi.com

jacooper
·
2 weeks ago
·
[ - ]

The problem with Chinese models is finding decent hosting. The best you can find right now for kimi k2 is only 30 tps, not great.

data_maan
·
2 weeks ago
·
[ - ]

Open source" lol

It's open-weight. As usual, you don't get the dataset, training scripts, etc.

helloericsf
·
2 weeks ago
·
[ - ]

How does it stack up against the new Grok 4 model?

·
2 weeks ago
·
[ - ]

MichaelKSpencer
·
2 weeks ago
·
[ - ]

[dead]

MichaelKSpencer
·
2 weeks ago
·
[ - ]

[dead]

38
·
2 weeks ago
·
[ - ]

The web chat has extremely low limits FYI. I ran into the limit twice before getting a sane answer and gave up

unit149
·
2 weeks ago
·
[ - ]

[dead]

38
·
2 weeks ago
·
[ - ]

The web chat has extremely low limits FYI. I ran into the limit twice before getting a sane answer and gave up

awestroke
·
2 weeks ago
·
[ - ]

You can use it on OpenRouter without limits (paid API calls)

mistressgabby
·
2 weeks ago
·
[ - ]

[flagged]

brcmthrowaway
·
2 weeks ago
·
[ - ]

Is Kimi the new deep seek?

Alifatisk
·
2 weeks ago
·
[ - ]

It kinda feels like it, but Moonshots delivery has been like this before aswell, it was just now their new release got way more highlight than usual. When they released Kimi k1.5, those bench were impressive at the time! But everyone was busy with Deepseek v3 and QwQ-32B