Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete

Hey HN, we trained and open-sourced a 1.5B model that predicts your next edits, similar to Cursor. You can download the weights here (https://huggingface.co/sweepai/sweep-next-edit-1.5b) or try it in our JetBrains plugin (https://plugins.jetbrains.com/plugin/26860-sweep-ai-autocomp...).

Next-edit autocomplete differs from standard autocomplete by using your recent edits as context when predicting completions. The model is small enough to run locally while outperforming models 4x its size on both speed and accuracy.

We tested against Mercury (Inception), Zeta (Zed), and Instinct (Continue) across five benchmarks: next-edit above/below cursor, tab-to-jump for distant changes, standard FIM, and noisiness. We found exact-match accuracy correlates best with real usability because code is fairly precise and the solution space is small.

Prompt format turned out to matter more than we expected. We ran a genetic algorithm over 30+ diff formats and found simple `original`/`updated` blocks beat unified diffs. The verbose format is just easier for smaller models to understand.

Training was SFT on ~100k examples from permissively-licensed repos (4hrs on 8xH100), then RL for 2000 steps with tree-sitter parse checking and size regularization. The RL step fixes edge cases SFT can’t like, generating code that doesn’t parse or overly verbose outputs.

We're open-sourcing the weights so the community can build fast, privacy-preserving autocomplete for any editor. If you're building for VSCode, Neovim, or something else, we'd love to see what you make with it!

382
64
williamzeng0
13 hours ago
huggingface.co

KronisLV
·
5 hours ago
·
[ - ]

I remember using Qwen 2.5 Coder for autocomplete with Continue.dev, that experience was a mess both in JetBrains IDEs, as well as Visual Studio Code.

People posting stuff like this is really cool because otherwise it kinda feels like nobody gives a crap, for example even with Cline/RooCode/KiloCode there’s no good way for me to hook up an autocomplete model that either runs in Ollama or maybe a remote Cerebras Code model, like KiloCode doesn’t have a proper model configuration option even if it has it for the chat or regular agentic stuff - I don’t get why autocomplete is such a special case.

I guess what I’m saying is that I’m glad someone’s at least trying so I don’t have to keep a Copilot subscription just because I genuinely like their autocomplete and the rest of it is basically wasted: Claude Code and Codex and others are better for the actual chat/agentic stuff, KiloCode and others are really nice IDE plugins.

lostmsu
·
3 hours ago
·
[ - ]

llama.cpp has an extension for VS Code, but configuration UX is utter crap

leonardcser
·
2 hours ago
·
[ - ]

Hi, I tried the model and I am super impressed by the performance/quality. Thanks for making this open source!

I am the author of this Neovim plugin for edit completions. I was able to integrate it with the Sweep Edit model.

For anyone who is interested: https://github.com/leonardcser/cursortab.nvim

vanillameow
·
4 hours ago
·
[ - ]

Sometimes when I use a plugin like this I get reminded just how much of a productivity nerf it is to code without an autocomplete AI. Honestly in my opinion if you write a lot of boilerplate code this is almost more useful than something like Claude Code, because it turbocharges your own train of thought rather than making you review someone else's, which may not align with your vision.

This is a really good plugin. I'm a diehard JetBrains user, I tried switching to VSCode and its various forks many times because of AI but muscle memory from years of use is hard to override. And for a lot of languages JetBrains is just much better, especially out of the box. But they dropped the ball so hard on AI it's unbelievable. Claude Code pulled it back a bit because at least now the cutting edge tools aren't just VSCode plugins, but I was still missing a solid autocomplete tool. Glad this is here to fill that niche. Very likely will be switching my GitHub copilot subscription to this.

I also really appreciate publishing open weights and allowing a privacy mode for anonymous trial users, even if it's opt-in. Usually these things seem to be reserved for paying tiers these days...

cmrdporcupine
·
1 hour ago
·
[ - ]

Yep. I'm coming to reset Claude Code and tools like it for taking me out of direct contact with the code.

I think we're still in the early days of these systems. The models could be capable of a lot more than this "chat log" methodology.

Agree about JetBrains dropping the ball. Saddens me because I've also been a diehard user of their products since 2004.

bberenberg
·
4 minutes ago
·
[ - ]

This seems great for code, but can this be used for non-code use cases?

kleiba
·
6 hours ago
·
[ - ]

Very cool!

I understand that the 1.5B is small enough to run locally... but does it actually in the Sweep AI Jetbrains plugin? That is, if I install the plugin, will I download the model automatically and the plugin doesn't phone home?

bjarteaarmolund
·
3 hours ago
·
[ - ]

no, as far as I can see there is no way to configure the Jetbrains plugin to use a local endpoint.

NewsaHackO
·
2 hours ago
·
[ - ]

Yes, I get the same vibe, as one has to sign in to their site to use the plugin. Kind of grimy for them to seemingly imply that it is locally run when it isn't.

esquire_900
·
6 hours ago
·
[ - ]

Surprising how badly Jetbrains implemented AI. Apparently to such an extent that even after multiple years of LLM's someone felt confident enough to build a company that can do better.

This looks really neat, interesting technical writeup as well!

_ache_
·
7 hours ago
·
[ - ]

It's good. The blog post about it is very interesting. I hope, a plugin for neovim will be made soon.

https://blog.sweep.dev/posts/oss-next-edit

evanreichard
·
43 minutes ago
·
[ - ]

There's also https://github.com/ggml-org/llama.vim

Which I've been using with Qwen3 Coder. As long as infill is supported, that should work. I'll try later today.

mromanuk
·
1 hour ago
·
[ - ]

There is one already, on of the plugin authors commented here

WanderlingSmurf
·
2 hours ago
·
[ - ]

[dead]

notsylver
·
2 hours ago
·
[ - ]

I've been waiting for something like this for ages. Cursor making me pay $20/month when all I use from it is autocomplete was always a little annoying, especially as they changed the UI to push agents more and it got in the way. I was even considering doing it myself but wasn't sure about gambling on models small enough to run locally being smart enough to do anything useful.

I threw together a vscode extension to run it and while the extension is rough, the model seems decent. I'm trying to keep my expectations contained, in the past local models have been absolutely terrible for inline completion, this seems much better already. I hope this kicks off more competition.

magnat
·
5 hours ago
·
[ - ]

Is there a way to use this (or similar) model in Visual Studio? Extensions on Visual Studio Marketplace are clunky and sluggish at best, if they even work at all.

denysvitali
·
4 hours ago
·
[ - ]

If you mean VSCode (or any other editor):

> We’re open sourcing the model weights so the community can build fast, privacy-preserving autocomplete for every IDE - VSCode, Neovim, Emacs, and beyond.

https://blog.sweep.dev/posts/oss-next-edit

magnat
·
3 hours ago
·
[ - ]

No, I mean Visual Studio (the IDE), not Visual Studio Code (the editor).

KeplerBoy
·
3 hours ago
·
[ - ]

Of course they are different products, but is there really a meaningful distinction between VS Code and an IDE? For all i care VS Code is a complete IDE.

ttoinou
·
3 hours ago
·
[ - ]

You need to add (official) extensions for that though. Which makes VSCode more flexible

pezgrande
·
3 hours ago
·
[ - ]

I thought there was already a generic plugin for this :(. Let's wait for one then ha, or I may just make one.

kamranjon
·
8 hours ago
·
[ - ]

I read the release but didn't quite understand the difference between a next-edit model and a FIM model - does anyone have a clear explanation of when to use one over the other? I'd love if there was a sublime plugin to utilize this model and try it out, might see if I can figure that out.

zoobab
·
2 hours ago
·
[ - ]

Where is the training data?

We can't keep calling those models "open source" if we have a black box and know precisely how they were made.

"Open weights" are the new binary.

martianlantern
·
7 hours ago
·
[ - ]

This is cool! I am more interested in how you guys generated next edit training data from repos, seems like there are lots of caveats here. Would love your insights

Again amazing work! waiting for what you guys cook next

keyle
·
2 hours ago
·
[ - ]

I'm playing around with this in LMStudio (in huggingface -> use this model dropdown -> LMStudio)

It's really impressive so far, so quick to respond on a mac mini M2. And it appears to be accurate at least for the obvious questions.

I couldn't get it to work as an autocomplete of Zed unfortunately. It looks like it's hardwired to work with some providers and LMStudio is not included in the prediction engines list. Has anyone got a work around?

keepamovin
·
3 hours ago
·
[ - ]

This is so cool. What is the second order effect of model training becoming democratized? And local models becoming the norm? Tasks like agentic work are well handled by current AI as long as you know what you're doing and can stress the agent against tests/spec, etc.

I am thinking that one effect is:

- it will become normal for meta-models to train a model specific to a particular task/product.

Also, differently, I'm quite sure that AGI is not available on this current path (useful tho it is), but that some algo improvements might crack ubiquitous trainable AGI. Probably including some kind of embodiment to provide world-models and emotions (which are essential to embodied survival and success).

mgz
·
8 hours ago
·
[ - ]

I use Sweep’s Jetbrains autocomplete plugin daily, it really stands out.

smusamashah
·
4 hours ago
·
[ - ]

Does it run totally offline?

8n4vidtmkvmk
·
7 hours ago
·
[ - ]

Better than the one that ships with Jetbrains?

I did buy their $100/yr AI but its about to run out.

mgz
·
10 minutes ago
·
[ - ]

Definitely better. Next edit makes a difference. But it is not free, I think I pay $10/month.

h33t-l4x0r
·
6 hours ago
·
[ - ]

It sounds like you might be killing Zed's ability to monetize, am I misunderstanding that?

BoredPositron
·
4 hours ago
·
[ - ]

If your only feature worth monetizing is replicated by a solo dev in his freetime you might have a problem.

woile
·
2 hours ago
·
[ - ]

Hey, ollama run as suggested in hf doesn't seem to work with this model. This worked instead:

ollama pull hf.co/sweepai/sweep-next-edit-1.5B

vichle
·
4 hours ago
·
[ - ]

What type of hardware do I need to run a small model like this? I don't do Apple.

bodegajed
·
4 hours ago
·
[ - ]

1.5B models can run on CPU inference at around 12 tokens per second if I remember correctly.

moffkalast
·
4 hours ago
·
[ - ]

Ingesting multiple code files will take forever in prompt processing without a GPU though, tg will be the least of your worries. Especially when you don't append but change it in random places so caching doesn't work.

jychang
·
4 hours ago
·
[ - ]

1.54GB model? You can run this on a raspberry pi.

BoredomIsFun
·
1 hour ago
·
[ - ]

Performance of LLM inference consists of two independent metrics - prompt processing (compute intensive) and token generation (bandwidth intensive). For autocomplete with 1.5B you can get away with abysmal 10 t/s token generation performance, but you'd want as fast as possible prompt processing, pi in incapable of.

dainiusse
·
7 hours ago
·
[ - ]

Any easy way to try on vscode?

bangaladore
·
8 hours ago
·
[ - ]

So SFT cost less only low hundreds of dollars? (1-10$ per hour per H100 if I'm seeing this correctly).

What about SFT?

Presumably basing this of Qwen is the reason it can be done for so cheap?

ttoinou
·
3 hours ago
·
[ - ]

Wow, I can even chat about C code with that model with LM Studio on my Macbook at 200 tokens per seconds

andruby
·
4 hours ago
·
[ - ]

How easy is it to re-train these to specific subset of programming languages? Could there be a "ruby+rails+html" version, etc?

syntaxing
·
8 hours ago
·
[ - ]

Wow super fun read, I love how it went into the technical details. Any way to make it work with vscode?

_boffin_
·
7 hours ago
·
[ - ]

Followed your work since the beginning and used it for inspiration for some cool demos on self-healing web scrapers. fascinating to see the transition from original concept to producing models. cool stuff.

sim04ful
·
7 hours ago
·
[ - ]

I'm very green to this so forgive if this question sounds silly:

Would instead of the RL step a constrained decoding say via something like xgrammar fix syntax generation issue ?

NitpickLawyer
·
7 hours ago
·
[ - ]

> Would instead of the RL step a constrained decoding say via something like xgrammar fix syntax generation issue ?

It can, but you have to consider two things here:

a) constrained decoding ensures adherence to syntax, not semantics. Say you're editing a field in an enum in rust. You can write syntactically correct rust code that doesn't address the new field further in the code (say in a switch). You'd get correctly syntactic code, but the compiler will scream at you. RL works on both.

b) if your goal is to further train the model, so it works on many tasks, RL helps with exploring new paths and training the model further. Constrained grammars help with inference, but the model doesn't "learn" anything. With RL you can also have many reward functions at the same time. Say one that rewards good syntax, one that rewards "closing" all the functions so tree-sitter doesn't complain, and one that rewards 0 errors from the compiler. The model gets to train on all 3 at the same time.

logicallee
·
3 hours ago
·
[ - ]

Congratulations on training a relatively small model that can beat larger models for this important task.

>We ran a genetic algorithm over 30+ diff formats

Can you you give more information about your genetic algorithm? Did you do crossover over the trained models (for example, ranking by fitness, take 20% most elite and create children by mixing their weights randomly)? Did you have a 'population size' (number of instances) for the genetic algorithms, and if so what was it?

moelf
·
6 hours ago
·
[ - ]

what do people use for Neovim to integrate these models for tab-completion level of stuff. (i.e. non agentic/vibe coding)

dajonker
·
5 hours ago
·
[ - ]

I use llama.vim with llama.cpp and the qwen2.5-coder 7B model. Easily fits on a 16 GB GPU and is fast even on a tiny RTX 2000 card with 70 watts of power. Quality of completions is good enough for me, if I want something more sophisticated I use something like Codex

whimsicalism
·
7 hours ago
·
[ - ]

Very interesting - and cool to read about the development process. I'd love to hear more about how genetic algorithm worked here.

I wonder whether we are perhaps the point of usefulness of 'next edit' code development in 2026 though.

rw_panic0_0
·
3 hours ago
·
[ - ]

is there any llm lsp it can integrate well with?

ragchronos
·
4 hours ago
·
[ - ]

Does anyone know if the 7B model is also available somewhere?

rationably
·
7 hours ago
·
[ - ]

Do you plan to release Sweep 3B/7B on HF?

_ache_
·
7 hours ago
·
[ - ]

Yeap, the two seems like game changer. For now, I'm using "Qwen2.5-Coder-7B". Sweep 1.5B is "just" 12 % point better than Qwen2.5-Coder, but Sweep 7B is 25% point better.

jedisct1
·
4 hours ago
·
[ - ]

Really cool.

But how to use it instead of Copilot in VSCode ?

replete
·
1 hour ago
·
[ - ]

Run server with ollama, use Continue extension configured for ollama

BoredomIsFun
·
1 hour ago
·
[ - ]

I'd stay away from ollana, just use llama.cpp; it is more up date, better performing and more flexible.

flanked-evergl
·
2 hours ago
·
[ - ]

Would love to know myself, I recall there was some plugin for VSCode that did next edits that accepted a custom model but I don't recall what it was now.

ing33k
·
8 hours ago
·
[ - ]

can it be integrated in monaco editor ?

_mugencode
·
2 hours ago
·
[ - ]

[dead]

asyncze
·
6 hours ago
·
[ - ]

[dead]

wepaean
·
8 hours ago
·
[ - ]

[dead]

plutodev
·
9 hours ago
·
[ - ]

[flagged]

oefrha
·
5 hours ago
·
[ - ]

You’re subtly pushing the same product in basically every one of your comments. If these are good faith comments please edit out the product name, it’s unnecessary and doing so as a green account just makes people consider you a spammer. Establish yourself first.

lelanthran
·
59 minutes ago
·
[ - ]

Or he could disclose it.l, which he did in a different comment on a different story.

I agree that green accounts could be regarded as suspicious and, if it were me, I'd disclose each time I mention it.

subscribed
·
2 hours ago
·
[ - ]

They've submitted "I'm working at io.net" quite openly, but I admit, they should at least announce their employment in the bio, otherwise it's a very poorly executed astroturf post (phrased like they're an experimenting user and not a dev).

kouteiheika
·
6 hours ago
·
[ - ]

> On the infra side, training a 1.5B model in ~4 hours on 8×H100 is impressive.

It's hard to compare without more details about the training process and the dataset, but, is it? Genuine question, because I had the opposite impression. Like, for example, recently I did a full finetuning run on a 3B model chewing through a 146k entry dataset (with 116k entries having reasoning traces, so they're not short) in 7 hours on a single RTX 6000.

·
5 hours ago
·
[ - ]

dcreater
·
8 hours ago
·
[ - ]

Based on qwen2.5-coder? seems like a "why not/resume embellish/show VC" type release I guess

dang
·
8 hours ago
·
[ - ]

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html