Next-edit autocomplete differs from standard autocomplete by using your recent edits as context when predicting completions. The model is small enough to run locally while outperforming models 4x its size on both speed and accuracy.
We tested against Mercury (Inception), Zeta (Zed), and Instinct (Continue) across five benchmarks: next-edit above/below cursor, tab-to-jump for distant changes, standard FIM, and noisiness. We found exact-match accuracy correlates best with real usability because code is fairly precise and the solution space is small.
Prompt format turned out to matter more than we expected. We ran a genetic algorithm over 30+ diff formats and found simple `original`/`updated` blocks beat unified diffs. The verbose format is just easier for smaller models to understand.
Training was SFT on ~100k examples from permissively-licensed repos (4hrs on 8xH100), then RL for 2000 steps with tree-sitter parse checking and size regularization. The RL step fixes edge cases SFT can’t like, generating code that doesn’t parse or overly verbose outputs.
We're open-sourcing the weights so the community can build fast, privacy-preserving autocomplete for any editor. If you're building for VSCode, Neovim, or something else, we'd love to see what you make with it!
People posting stuff like this is really cool because otherwise it kinda feels like nobody gives a crap, for example even with Cline/RooCode/KiloCode there’s no good way for me to hook up an autocomplete model that either runs in Ollama or maybe a remote Cerebras Code model, like KiloCode doesn’t have a proper model configuration option even if it has it for the chat or regular agentic stuff - I don’t get why autocomplete is such a special case.
I guess what I’m saying is that I’m glad someone’s at least trying so I don’t have to keep a Copilot subscription just because I genuinely like their autocomplete and the rest of it is basically wasted: Claude Code and Codex and others are better for the actual chat/agentic stuff, KiloCode and others are really nice IDE plugins.
I am the author of this Neovim plugin for edit completions. I was able to integrate it with the Sweep Edit model.
For anyone who is interested: https://github.com/leonardcser/cursortab.nvim
This is a really good plugin. I'm a diehard JetBrains user, I tried switching to VSCode and its various forks many times because of AI but muscle memory from years of use is hard to override. And for a lot of languages JetBrains is just much better, especially out of the box. But they dropped the ball so hard on AI it's unbelievable. Claude Code pulled it back a bit because at least now the cutting edge tools aren't just VSCode plugins, but I was still missing a solid autocomplete tool. Glad this is here to fill that niche. Very likely will be switching my GitHub copilot subscription to this.
I also really appreciate publishing open weights and allowing a privacy mode for anonymous trial users, even if it's opt-in. Usually these things seem to be reserved for paying tiers these days...
I think we're still in the early days of these systems. The models could be capable of a lot more than this "chat log" methodology.
Agree about JetBrains dropping the ball. Saddens me because I've also been a diehard user of their products since 2004.
I understand that the 1.5B is small enough to run locally... but does it actually in the Sweep AI Jetbrains plugin? That is, if I install the plugin, will I download the model automatically and the plugin doesn't phone home?
This looks really neat, interesting technical writeup as well!
Which I've been using with Qwen3 Coder. As long as infill is supported, that should work. I'll try later today.
I threw together a vscode extension to run it and while the extension is rough, the model seems decent. I'm trying to keep my expectations contained, in the past local models have been absolutely terrible for inline completion, this seems much better already. I hope this kicks off more competition.
> We’re open sourcing the model weights so the community can build fast, privacy-preserving autocomplete for every IDE - VSCode, Neovim, Emacs, and beyond.
We can't keep calling those models "open source" if we have a black box and know precisely how they were made.
"Open weights" are the new binary.
Again amazing work! waiting for what you guys cook next
It's really impressive so far, so quick to respond on a mac mini M2. And it appears to be accurate at least for the obvious questions.
I couldn't get it to work as an autocomplete of Zed unfortunately. It looks like it's hardwired to work with some providers and LMStudio is not included in the prediction engines list. Has anyone got a work around?
I am thinking that one effect is:
- it will become normal for meta-models to train a model specific to a particular task/product.
Also, differently, I'm quite sure that AGI is not available on this current path (useful tho it is), but that some algo improvements might crack ubiquitous trainable AGI. Probably including some kind of embodiment to provide world-models and emotions (which are essential to embodied survival and success).
I did buy their $100/yr AI but its about to run out.
ollama pull hf.co/sweepai/sweep-next-edit-1.5B
What about SFT?
Presumably basing this of Qwen is the reason it can be done for so cheap?
Would instead of the RL step a constrained decoding say via something like xgrammar fix syntax generation issue ?
It can, but you have to consider two things here:
a) constrained decoding ensures adherence to syntax, not semantics. Say you're editing a field in an enum in rust. You can write syntactically correct rust code that doesn't address the new field further in the code (say in a switch). You'd get correctly syntactic code, but the compiler will scream at you. RL works on both.
b) if your goal is to further train the model, so it works on many tasks, RL helps with exploring new paths and training the model further. Constrained grammars help with inference, but the model doesn't "learn" anything. With RL you can also have many reward functions at the same time. Say one that rewards good syntax, one that rewards "closing" all the functions so tree-sitter doesn't complain, and one that rewards 0 errors from the compiler. The model gets to train on all 3 at the same time.
>We ran a genetic algorithm over 30+ diff formats
Can you you give more information about your genetic algorithm? Did you do crossover over the trained models (for example, ranking by fitness, take 20% most elite and create children by mixing their weights randomly)? Did you have a 'population size' (number of instances) for the genetic algorithms, and if so what was it?
I wonder whether we are perhaps the point of usefulness of 'next edit' code development in 2026 though.
But how to use it instead of Copilot in VSCode ?
I agree that green accounts could be regarded as suspicious and, if it were me, I'd disclose each time I mention it.
It's hard to compare without more details about the training process and the dataset, but, is it? Genuine question, because I had the opposite impression. Like, for example, recently I did a full finetuning run on a 3B model chewing through a 146k entry dataset (with 116k entries having reasoning traces, so they're not short) in 7 hours on a single RTX 6000.