Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete

https://huggingface.co/sweepai/sweep-next-edit-1.5B

534•williamzeng0•2w ago

Hey HN, we trained and open-sourced a 1.5B model that predicts your next edits, similar to Cursor. You can download the weights here (https://huggingface.co/sweepai/sweep-next-edit-1.5b) or try it in our JetBrains plugin (https://plugins.jetbrains.com/plugin/26860-sweep-ai-autocomp...).

Next-edit autocomplete differs from standard autocomplete by using your recent edits as context when predicting completions. The model is small enough to run locally while outperforming models 4x its size on both speed and accuracy.

We tested against Mercury (Inception), Zeta (Zed), and Instinct (Continue) across five benchmarks: next-edit above/below cursor, tab-to-jump for distant changes, standard FIM, and noisiness. We found exact-match accuracy correlates best with real usability because code is fairly precise and the solution space is small.

Prompt format turned out to matter more than we expected. We ran a genetic algorithm over 30+ diff formats and found simple `original`/`updated` blocks beat unified diffs. The verbose format is just easier for smaller models to understand.

Training was SFT on ~100k examples from permissively-licensed repos (4hrs on 8xH100), then RL for 2000 steps with tree-sitter parse checking and size regularization. The RL step fixes edge cases SFT can’t like, generating code that doesn’t parse or overly verbose outputs.

We're open-sourcing the weights so the community can build fast, privacy-preserving autocomplete for any editor. If you're building for VSCode, Neovim, or something else, we'd love to see what you make with it!

Comments

kamranjon•2w ago

I read the release but didn't quite understand the difference between a next-edit model and a FIM model - does anyone have a clear explanation of when to use one over the other? I'd love if there was a sublime plugin to utilize this model and try it out, might see if I can figure that out.

sheepscreek•2w ago

I’m going to speculate a bit here, FIM may stand for something-in-the-middle?

I know there are the original autocomplete models that simply complete the endings. Then there are Cursor like models capable of editing/filling text between blocks of code. In essence, they look at both the text before the insertion point and after it - then find the best fitting completion in the middle. My guess is FIM is the latter.

aidos•2w ago

As you said. Fill-in-the-middle.

evolving-silica•2w ago

I was curious as well and wanted to try how this work, so I asked claude to create a plugin for that. This utilizes built-in autocomplete behavior. If you want to give it a try then feel free to have a look here https://github.com/lumnn/AItoComplete (did not push it to packagecontrol yet)

kevinlu1248•2w ago

We have an explanation here: https://blog.sweep.dev/posts/next-edit-jetbrains#next-edit-a...

But basically suggesting changes away from your cursor position

mgz•2w ago

I use Sweep’s Jetbrains autocomplete plugin daily, it really stands out.

8n4vidtmkvmk•2w ago

Better than the one that ships with Jetbrains?

I did buy their $100/yr AI but its about to run out.

mgz•2w ago

Definitely better. Next edit makes a difference. But it is not free, I think I pay $10/month.

hdjrudni•2w ago

Oh, i thought you were talking about this self hosted 1.5B model. You must be talking about the full model as a service?

smusamashah•2w ago

Does it run totally offline?

dcreater•2w ago

Based on qwen2.5-coder? seems like a "why not/resume embellish/show VC" type release I guess

dang•2w ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

kevinlu1248•2w ago

You can see that Qwen3 does worse than Qwen2.5 on our benchmark. Reason is it's never been pretrained for FIM / autocomplete.

ing33k•2w ago

can it be integrated in monaco editor ?

bangaladore•2w ago

So SFT cost less only low hundreds of dollars? (1-10$ per hour per H100 if I'm seeing this correctly).

What about SFT?

Presumably basing this of Qwen is the reason it can be done for so cheap?

syntaxing•2w ago

Wow super fun read, I love how it went into the technical details. Any way to make it work with vscode?

martianlantern•2w ago

This is cool! I am more interested in how you guys generated next edit training data from repos, seems like there are lots of caveats here. Would love your insights

Again amazing work! waiting for what you guys cook next

knowaveragejoe•2w ago

The blog post has more information: https://blog.sweep.dev/posts/oss-next-edit

kevinlu1248•2w ago

Also more technical details on SFT data here:

https://blog.sweep.dev/posts/next-edit-jetbrains#building-au...

sim04ful•2w ago

I'm very green to this so forgive if this question sounds silly:

Would instead of the RL step a constrained decoding say via something like xgrammar fix syntax generation issue ?

NitpickLawyer•2w ago

> Would instead of the RL step a constrained decoding say via something like xgrammar fix syntax generation issue ?

It can, but you have to consider two things here:

a) constrained decoding ensures adherence to syntax, not semantics. Say you're editing a field in an enum in rust. You can write syntactically correct rust code that doesn't address the new field further in the code (say in a switch). You'd get correctly syntactic code, but the compiler will scream at you. RL works on both.

b) if your goal is to further train the model, so it works on many tasks, RL helps with exploring new paths and training the model further. Constrained grammars help with inference, but the model doesn't "learn" anything. With RL you can also have many reward functions at the same time. Say one that rewards good syntax, one that rewards "closing" all the functions so tree-sitter doesn't complain, and one that rewards 0 errors from the compiler. The model gets to train on all 3 at the same time.

kevinlu1248•2w ago

^ these were pretty much the main reasons.

The other one is that constrained decoding only works on CFGs (simpler grammars like JSON schemas) since only these ones can produce automatas which can be used for constrained decoding. Programming languages like Python and C++ aren't CFGs so it doesn't work.

Also constrained decoding generally worsens model quality since the model would be generating off-policy. So RL helps push corrected syntax back on-policy.

rationably•2w ago

Do you plan to release Sweep 3B/7B on HF?

_ache_•2w ago

Yeap, the two seems like game changer. For now, I'm using "Qwen2.5-Coder-7B". Sweep 1.5B is "just" 12 % point better than Qwen2.5-Coder, but Sweep 7B is 25% point better.

kevinlu1248•2w ago

Not at the moment but we do host it for our Jetbrains plugin

_ache_•2w ago

It's good. The blog post about it is very interesting. I hope, a plugin for neovim will be made soon.

https://blog.sweep.dev/posts/oss-next-edit

mromanuk•2w ago

There is one already, on of the plugin authors commented here

evanreichard•2w ago

There's also https://github.com/ggml-org/llama.vim

Which I've been using with Qwen3 Coder. As long as infill is supported, that should work. I'll try later today.

jmanandchanny•2w ago

Thanks for sharing this. I personally use vim and not neovim (I do not have anything against it), so this plugin will be a great addon for me. Currently, I have to switch from vim to Cursor and back again for any kind of vibe coding.

kevinlu1248•2w ago

Someone in this thread already built a Neovim plugin connecting to this model I believe.

_boffin_•2w ago

Followed your work since the beginning and used it for inspiration for some cool demos on self-healing web scrapers. fascinating to see the transition from original concept to producing models. cool stuff.

whimsicalism•2w ago

Very interesting - and cool to read about the development process. I'd love to hear more about how genetic algorithm worked here.

I wonder whether we are perhaps the point of usefulness of 'next edit' code development in 2026 though.

dainiusse•2w ago

Any easy way to try on vscode?

esquire_900•2w ago

Surprising how badly Jetbrains implemented AI. Apparently to such an extent that even after multiple years of LLM's someone felt confident enough to build a company that can do better.

This looks really neat, interesting technical writeup as well!

kevinlu1248•2w ago

Thanks! Let us know if you have any questions / feedback.

h33t-l4x0r•2w ago

It sounds like you might be killing Zed's ability to monetize, am I misunderstanding that?

BoredPositron•2w ago

If your only feature worth monetizing is replicated by a solo dev in his freetime you might have a problem.

gunalx•2w ago

not really though. The zed monetization seems to push towards selling tokens for full fledged models with good ide integration as a service. (They have let you run a custom tabcomplete for a little while)

kleiba•2w ago

Very cool!

I understand that the 1.5B is small enough to run locally... but does it actually in the Sweep AI Jetbrains plugin? That is, if I install the plugin, will I download the model automatically and the plugin doesn't phone home?

bjarteaarmolund•2w ago

no, as far as I can see there is no way to configure the Jetbrains plugin to use a local endpoint.

NewsaHackO•2w ago

Yes, I get the same vibe, as one has to sign in to their site to use the plugin. Kind of grimy for them to seemingly imply that it is locally run when it isn't.

rkagerer•2w ago

Why not?

Can someone make a better plugin?

kevinlu1248•2w ago

Not at the moment, if you install the hosted Sweep AI Jetbrains plugin it uses our hosted (larger) model.

moelf•2w ago

what do people use for Neovim to integrate these models for tab-completion level of stuff. (i.e. non agentic/vibe coding)

dajonker•2w ago

I use llama.vim with llama.cpp and the qwen2.5-coder 7B model. Easily fits on a 16 GB GPU and is fast even on a tiny RTX 2000 card with 70 watts of power. Quality of completions is good enough for me, if I want something more sophisticated I use something like Codex

magnat•2w ago

Is there a way to use this (or similar) model in Visual Studio? Extensions on Visual Studio Marketplace are clunky and sluggish at best, if they even work at all.

denysvitali•2w ago

If you mean VSCode (or any other editor):

> We’re open sourcing the model weights so the community can build fast, privacy-preserving autocomplete for every IDE - VSCode, Neovim, Emacs, and beyond.

https://blog.sweep.dev/posts/oss-next-edit

pezgrande•2w ago

I thought there was already a generic plugin for this :(. Let's wait for one then ha, or I may just make one.

magnat•2w ago

No, I mean Visual Studio (the IDE), not Visual Studio Code (the editor).

KeplerBoy•2w ago

Of course they are different products, but is there really a meaningful distinction between VS Code and an IDE? For all i care VS Code is a complete IDE.

ttoinou•2w ago

You need to add (official) extensions for that though. Which makes VSCode more flexible

KronisLV•2w ago

I remember using Qwen 2.5 Coder for autocomplete with Continue.dev, that experience was a mess both in JetBrains IDEs, as well as Visual Studio Code.

People posting stuff like this is really cool because otherwise it kinda feels like nobody gives a crap, for example even with Cline/RooCode/KiloCode there’s no good way for me to hook up an autocomplete model that either runs in Ollama or maybe a remote Cerebras Code model, like KiloCode doesn’t have a proper model configuration option even if it has it for the chat or regular agentic stuff - I don’t get why autocomplete is such a special case.

I guess what I’m saying is that I’m glad someone’s at least trying so I don’t have to keep a Copilot subscription just because I genuinely like their autocomplete and the rest of it is basically wasted: Claude Code and Codex and others are better for the actual chat/agentic stuff, KiloCode and others are really nice IDE plugins.

lostmsu•2w ago

llama.cpp has an extension for VS Code, but configuration UX is utter crap

vichle•2w ago

What type of hardware do I need to run a small model like this? I don't do Apple.

jychang•2w ago

1.54GB model? You can run this on a raspberry pi.

BoredomIsFun•2w ago

Performance of LLM inference consists of two independent metrics - prompt processing (compute intensive) and token generation (bandwidth intensive). For autocomplete with 1.5B you can get away with abysmal 10 t/s token generation performance, but you'd want as fast as possible prompt processing, pi in incapable of.

gunalx•2w ago

if you mean on the new ai hat with npu and integrated 8gb memory, maybe.

bodegajed•2w ago

1.5B models can run on CPU inference at around 12 tokens per second if I remember correctly.

moffkalast•2w ago

Ingesting multiple code files will take forever in prompt processing without a GPU though, tg will be the least of your worries. Especially when you don't append but change it in random places so caching doesn't work.

bradfa•2w ago

A FIM or completion model like this won't have a large prompt and caching doesn't work anyways (per their notes). It'll get maybe a few thousand tokens in a prompt, maximum. For a 1.5B model, you should expect usable CPU-only inference on a modern CPU, like at least hundreds of tokens per second of prefill and tens of tokens per second of generation, which is decently usable in terms of responsiveness.

moffkalast•2w ago

A thousand tokens (which would be on the low side) at 10-100 t/s in ingestion speed is 10-100 seconds. I don't seriously expect anyone to wait a solid minute after pressing tab for autocomplete, regular autocomplete gets unusably annoying if it takes more than a split second tbh.

kevinlu1248•2w ago

Unfortunately, the main optimization (3x speedup) is using n-gram spec dec which doesn't run on CPUs. But I believe it works on Metal at least.

andruby•2w ago

How easy is it to re-train these to specific subset of programming languages? Could there be a "ruby+rails+html" version, etc?

bradfa•2w ago

I'd love to be able to take an open model like this and feed it the codebases that I work on regularly in order to improve its performance for less "hip/modern" languages and frameworks. It would be awesome to see a blog post about how normal users can find tune these models and rough cost estimates with examples!

vanillameow•2w ago

Sometimes when I use a plugin like this I get reminded just how much of a productivity nerf it is to code without an autocomplete AI. Honestly in my opinion if you write a lot of boilerplate code this is almost more useful than something like Claude Code, because it turbocharges your own train of thought rather than making you review someone else's, which may not align with your vision.

This is a really good plugin. I'm a diehard JetBrains user, I tried switching to VSCode and its various forks many times because of AI but muscle memory from years of use is hard to override. And for a lot of languages JetBrains is just much better, especially out of the box. But they dropped the ball so hard on AI it's unbelievable. Claude Code pulled it back a bit because at least now the cutting edge tools aren't just VSCode plugins, but I was still missing a solid autocomplete tool. Glad this is here to fill that niche. Very likely will be switching my GitHub copilot subscription to this.

I also really appreciate publishing open weights and allowing a privacy mode for anonymous trial users, even if it's opt-in. Usually these things seem to be reserved for paying tiers these days...

cmrdporcupine•2w ago

Yep. I'm coming to resent Claude Code and tools like it for taking me out of direct contact with the code.

I think we're still in the early days of these systems. The models could be capable of a lot more than this "chat log" methodology.

Agree about JetBrains dropping the ball. Saddens me because I've also been a diehard user of their products since 2004.

qorrect•2w ago

Glad to hear I'm not alone, the latest releases of JetBrains have been so bad I finally cancelled my subscription. VSCode has been a nice surprise, "its giving old emacs" as the kids would say.

sitkack•2w ago

I am curious about how both of you think Jetbrains is dropping the ball so much that you are no longer buying the tool.

You are still using it but no longer getting updates?

cmrdporcupine•2w ago

I use free RustRover for my open source work. I have not purchased a (new) license for my commercial work, because I haven't been getting as much value as it since my flow has switched to primarily agentic.

Mainly, they're pushing Junie and it just isn't that good or compelling, when faced off against the competition.

The key thing for me is that I think they had an opportunity here to really rethink how LLMs could interact with an editor since they potentially controlled both the editor and the LLM interaction. But they just implemented another chat-based interaction model with some bells and whistles, and also were late getting it out really, and what they delivered seemed a bit meh.

I was hoping for something that worked more closely inside the editing process, inline in the code, not just completions and then an agentic log alongside.

I also don't like that I can't seem to get it to work with 3rd party LLM providers, really. It seems to allow specifying an OpenAI API compatible endpoint, but it's janky and doesn't seem to allow me to refresh and manage the list of models properly?

It just still seems half-baked.

I love Opus and I am a heavy CC user now, but I don't like that Claude Code is taking me out of my IDE, away from hands on with the code, and out of my editing process.And I don't like how it tries to take over and how weak its review flow is. I end up almost always with surprises during my review process, despite my finding the quality of its code and analysis quite good. To me there was a real chance here for a company like JetBrains to show its worth in applying AI in a more sensible way than Anthropic has.

VSCode and Zed have no appeal to me though. I've mostly gone back to emacs.

In the meantime, their IDEs themselves feel a bit stalled in terms of advancement. And they've always suffered from performance problems since I started using them over 20 ago.

KronisLV•2w ago

> I have not purchased a (new) license for my commercial work, because I haven't been getting as much value as it since my flow has switched to primarily agentic.

I still buy a personal Ultimate license because I want to see them succeed even if like 80% of my time is spent either in a CLI or Visual Studio Code (for quicker startup and edits), a bit unfortunate that Fleet never got to be really good but oh well.

cmrdporcupine•2w ago

Over the last two decades I've given them quite a bit a money on personal subscriptions, and indirectly a lot more through employer purchases on my behalf.

I dislike VSCode very much, but I do think the foundational pieces of the JetBrain's IDEs are starting to show their age.

kevinlu1248•2w ago

Also wish Fleet took off, not a fan of installing a new IDE for every separate repo that's in a different language

kevinlu1248•2w ago

I've done some testing before and many of the new Jetbrains internal plugins cause memory leaks which really lags down my IDE...

zarzavat•2w ago

I have always said this and had people on HN reply that they don't get much use out of autocomplete, which puzzled me.

I'm starting to understand that there are two cultures.

Developers who are mostly writing new code get the most benefit from autocomplete and comparatively less from Claude Code. CC is neat but when it attempts to create something from nothing the code is often low quality and needs substantial work. It's kind of like playing a slot machine. Autocomplete, on the other hand, allows a developer to write the code they were going to write, but faster. It's always a productivity improvement.

Developers who are mostly doing maintenance experience the opposite. If your workflow is mostly based around an issue tracker rather than figma, CC is incredible, autocomplete less so.

genghisjahn•2w ago

I’m in the “write new stuff with cc and get great code.” Of course I’ll be told I don’t really know what I’m doing. That I just don’t know the difference between good and bad quality code. Sigh.

zarzavat•2w ago

The best code is no code.[0]

The main "issue" I have with Claude is that it is not good at noticing when code can be simplified with an abstraction. It will keep piling on lines until the file is 3000 lines long. You have to intervene and suggest abstractions and refactorings. I'm not saying that this is a bad thing. I don't want Claude refactoring my code (GPT-5 does this and it's very annoying). Claude is a junior developer that thinks it's a junior. GPT-5 is a junior developer that thinks it's a senior.

[0]: https://www.folklore.org/Negative_2000_Lines_Of_Code.html

kevinlu1248•2w ago

Definitely agree here, have had so many cases where I would like ask Claude for XYZ, then ask for XYZ again but with a small change. Instead of abstracting out the common code it would just duplicate the code with the small change.

norir•2w ago

I personally find autocomplete to be detrimental to my workflow so I disagree that it is a universal productivity improvement.

wwfn•2w ago

Hopefully not too offtopic: why so much boilerplate?

I see most would-be-boilerplate code refactored so the redundant bit becomes a small utility or library. But most of what I write is for research/analysis pipelines, so I'm likely missing an important insight. Like more verbose configuration over terse convention?

For code structure, snippets tempting[1] ("iff[tab]" => "if(...){...}") handles the bare conditional/loop completes in a more predictable way and offline/without a LLM eating into RAM.

[1] https://github.com/joaotavora/yasnippet; https://github.com/SirVer/ultisnips; https://code.visualstudio.com/docs/editing/userdefinedsnippe...

djfdat•2w ago

Abstracting away redundancy could make it harder to understand exactly what the code is doing, and could introduce tech debt when you need slightly different behavior from some code that is abstracted away. Also, if the boilerplate code is configuration, its good to see exactly what the configuration is when trying to grok how some code works.

You bring up a good point with snippets though, and I wonder if that would be good information to feed into the LLM for autocomplete. That snippet is helpful if you want to write on condition at a time, but say you have a dozen conditions if statements to write with that snippet. After writing one, the LLM could generate a suggestion for the other 11 conditions using that same snippet, while also taking into consideration the different types of values and what you might be checking against.

As for RAM/processing, you're not wrong there, but with specialized models, specialized hardware, and improvements in model design, the number of people working under such restricted environments where they are concerned about resource use will decrease over time, and the utility of these tools will increase. Sure a lower-tech solution works just fine, and it'll continue to work fine, but at some point the higher-tech solution will have similar levels of friction and resource use for much better utility.

mark_l_watson•2w ago

I agree with you. I do have some local model LLM support configured in Emacs, as well as integration with gemini 3 flash. However, all LLM support is turned off in my setup unless I specifically enable it for a few minutes.

I will definitely try the 1.5B model but I usually use LLMs by taking the time to edit a large one-shot prompt and feed it to either one of the new 8B or 30B local models or to gemini 3 flash via the app, web interface, or API.

Small purpose-built models are largely under-appreciated. I believe that it is too easy to fall into the trap of defaulting to the strongest models and to over rely on them. Shameless plug: it is still incomplete, but I have released an early version on my book ‘Winning Big With Small AI’ - so, I admit my opinions are a little biased!

pdyc•2w ago

do you know a good way to give context files in emacs? currently i have to either do ctrl-x+h to select file content of individual files to give to ai or copy files themselves from ai's chat interface. I would much prefer selecting all files at once and get their content copied to clipboard.

esafak•2w ago

Junie is irredeemable but if it's autocomplete that you are unhappy about, IntelliJ has both local- and cloud autocomplete now.

norir•2w ago

It is depressing that our collective solution to the problem of excess boilerplate keeps moving towards auto-generation of it.

jedisct1•2w ago

Really cool.

But how to use it instead of Copilot in VSCode ?

flanked-evergl•2w ago

Would love to know myself, I recall there was some plugin for VSCode that did next edits that accepted a custom model but I don't recall what it was now.

replete•2w ago

Run server with ollama, use Continue extension configured for ollama

BoredomIsFun•2w ago

I'd stay away from ollana, just use llama.cpp; it is more up date, better performing and more flexible.

mika6996•2w ago

But you can't just switch between installed models like in ollama, can you?

BoredomIsFun•2w ago

llama-swap? https://www.nijho.lt/post/llama-nixos/

ragchronos•2w ago

Does anyone know if the 7B model is also available somewhere?

logicallee•2w ago

Congratulations on training a relatively small model that can beat larger models for this important task.

>We ran a genetic algorithm over 30+ diff formats

Can you you give more information about your genetic algorithm? Did you do crossover over the trained models (for example, ranking by fitness, take 20% most elite and create children by mixing their weights randomly)? Did you have a 'population size' (number of instances) for the genetic algorithms, and if so what was it?

keepamovin•2w ago

This is so cool. What is the second order effect of model training becoming democratized? And local models becoming the norm? Tasks like agentic work are well handled by current AI as long as you know what you're doing and can stress the agent against tests/spec, etc.

I am thinking that one effect is:

- it will become normal for meta-models to train a model specific to a particular task/product.

Also, differently, I'm quite sure that AGI is not available on this current path (useful tho it is), but that some algo improvements might crack ubiquitous trainable AGI. Probably including some kind of embodiment to provide world-models and emotions (which are essential to embodied survival and success).

kevinlu1248•2w ago

Personally, I think usable AI is more valuable than simply more intelligence. Many of the labs are pushing towards models that are 1% better on CodeForces and AIME if you just let it think and use tools for hours, instead of more user-friendly models with better coding habits, like writing shorter and more modular code.

keepamovin•2w ago

Totally this. But the corp labs have incentives to keep researching per investors and staffing load, so they have to show work.

I guess a nice advantage of backwardness here is that economic opportunities exist for those who can solve pain points in the use of existing intel. Older models often do almost as well at agentic tasks in reality, can probably go further.

Still, AGI should remove a lot of this making it redundant, and it will then be more about the intel than the tooling. But an opportunity exists now. We may not have widespread AGI until 8 - 10 years later, so plenty of money to be made in the meantime.

kevinlu1248•2w ago

Ya definitely, that makes total sense. It feels to me that currently the labs have great researchers, who only care about making models perform better across raw intel and then they have incompetent applied AI engineers / FDE's who can only suggest using better prompting to remove bad habits to make agents more usable.

keepamovin•1w ago

Sounds like opportunities await those ready to fill the gaps left by these big ones! :)

rw_panic0_0•2w ago

is there any llm lsp it can integrate well with?

kevinlu1248•2w ago

We currently integrate with Jetbrains' PSI

ttoinou•2w ago

Wow, I can even chat about C code with that model with LM Studio on my Macbook at 200 tokens per seconds

kevinlu1248•2w ago

Haha, we never trained it for chat but I would bet it works regardless.

Also that's crazy, M4 Mac?

ttoinou•2w ago

M4 Max 128GB yeah

leonardcser•2w ago

Hi, I tried the model and I am super impressed by the performance/quality. Thanks for making this open source!

I am the author of this Neovim plugin for edit completions. I was able to integrate it with the Sweep Edit model.

For anyone who is interested: https://github.com/leonardcser/cursortab.nvim

lasgawe•2w ago

Hey this is really interesting. I'll try your nvim plugin

treyd•2w ago

Is there a port of this to Emacs or integration with gptel?

leonardcser•2w ago

Hi, not that I know of. Most of the code would not change. It could easily be ported to different editors. The core is the go server (`server/`).

9999gold•2w ago

It seems it would be possible to use this with minuet.el. I’m not familiar with it, though.

kevinlu1248•2w ago

this is awesome, i'm going to try this out

keyle•2w ago

I'm playing around with this in LMStudio (in huggingface -> use this model dropdown -> LMStudio)

It's really impressive so far, so quick to respond on a mac mini M2. And it appears to be accurate at least for the obvious questions.

I couldn't get it to work as an autocomplete of Zed unfortunately. It looks like it's hardwired to work with some providers and LMStudio is not included in the prediction engines list. Has anyone got a work around?

kevinlu1248•2w ago

Our hosted autocomplete is coming to Zed in a few weeks.

woile•2w ago

Hey, ollama run as suggested in hf doesn't seem to work with this model. This worked instead:

ollama pull hf.co/sweepai/sweep-next-edit-1.5B

woile•2w ago

I've been using it with the Zed editor and it works quite well! Congrats.

This kind of AI are the ones I like and I'm looking to run in my workstation.

theophaniel•2w ago

Could you give the gist / config on how you made it work with Zed ?

Imustaskforhelp•2w ago

+1, I wasn't able to make it work on zed either and It would really help if woile can tell how they made it work on their workstation.

Imustaskforhelp•2w ago

Edit: I asked chatgpt and just tinkered around till I found a setting which could work

{ "agent": { "default_model": { "model": "hf.co/sweepai/sweep-next-edit-1.5B:latest" } }, "inline_completion": { "default_provider": { "model": "hf.co/sweepai/sweep-next-edit-1.5B" } }, "chat_panel": { "default_provider": { "model": "hf.co/sweepai/sweep-next-edit-1.5B" } } }

Then go on the down bottom AI button or that gemini like logo and then select sweep model. And also you are expected to run ollama run command and ollama serve it

ollama pull hf.co/sweepai/sweep-next-edit-1.5B ollama run hf.co/sweepai/sweep-next-edit-1.5B

I did ask Chatgpt some parts about it tho and had to add this setting into my other settings too so ymmw but Its working for me

It's an interesting model for sure but I am unable to get tab auto_completion/inline in zed, I can ask it in summary and agentic mode of sorts and have a button at top which can generate code in file itself (which I found to be what I preferred in all this)

But I asked it to generate a simple hello world on localhost:8080 in golang and in the end it was able to but it took me like 10 minutes. But some other things like simple hello world was one shot for the most part

It's definitely an interesting model that's for sure. We need stronger model like these I can't imagine how strong it might be at 7B or 8B as iirc someone mentioned that this i think already has it or similar.

A lot of new developments are happening in here to make things smaller and I am all for it man!

mika6996•2w ago

You sure this works? inline_completion and chat_panel give me "Property inline_completion is not allowed." - not sure if this works regardless?

Imustaskforhelp•2w ago

I really don't know, I had asked chatgpt to create it and earlier it did give me a wrong one & I had to try out a lot of things and how it worked on my mac

I then pasted that whole convo into aistudio gemini flash to then summarize & give you the correct settings as my settings included some servers and their ip's by the zed remote feature too

Sorry that it didn't work. I um again asked from my working configuration to chatgpt and here's what I get (this may also not work or something so ymmv)

{ "agent": { "default_model": { "provider": "ollama", "model": "hf.co/sweepai/sweep-next-edit-1.5B:latest" }, "model_parameters": [] },

  "ui_font_size": 16,
  "buffer_font_size": 15,

  "theme": {
    "mode": "system",
    "light": "One Light",
    "dark": "One Dark"
  },

  // --- OLLAMA / SWEEP CONFIG ---
  "openai": {
    "api_url": "http://localhost:11434/v1",
    "low_latency_mode": true
  },

  //  TAB AUTOCOMPLETE (THIS IS THE IMPORTANT PART)
  "inline_completion": {
    "default_provider": {
      "name": "openai",
      "model": "hf.co/sweepai/sweep-next-edit-1.5B"
    }
  },

  //  CHAT SIDEBAR
  "chat_panel": {
    "default_provider": {
      "name": "openai",
      "model": "hf.co/sweepai/sweep-next-edit-1.5B"
    }
  }
}

woile•2w ago

This is it:

{

    "agent": {

        "inline_assistant_model": {

            "model": "hf.co/sweepai/sweep-next-edit-1.5B:latest",

            "provider": "ollama",

        },

    }

}

ihales•2w ago

I had Claude add it as an edit-prediction provider (running locally on llama.cpp on my Macbook Pro). It's been working well so far (including next-edit prediction!), though it could use more testing and tuning. If you want to try it out you can build my branch: https://github.com/ihales/zed/tree/sweep-local-edit-predicti...

If you have llama.cpp installed, you can start the model with `llama-server -hf sweepai/sweep-next-edit-1.5B --port 11434`

Add the following to your settings.json:

```

  "features": {
    "edit_prediction_provider": { "experimental": "sweep-local" },
  },
  "edit_predictions": {
    "sweep_local": {
      "api_url": "http://localhost:11434/v1/completions",
    },
  }

```

Other settings you can add in `edit_predictions.sweep_local` include:

- `model` - defaults to "sweepai/sweep-next-edit-1.5B"

- `max_tokens` - defaults to 2048

- `max_editable_tokens` - defaults to 600

- `max_context_tokens` - defaults to 1200

I haven't had time to dive into Zed edit predictions and do a thorough review of Claude's code (it's not much, but my rust is... rusty, and I'm short on free time right now), and there hasn't been much discussion of the feature, so I don't feel comfortable submitting a PR yet, but if someone else wants to take it from here, feel free!

oakesm9•2w ago

This is great and similar to what I was thinking of doing at some point. I just wasn't sure if it needed to be specific to Sweep Local or if it could be a generic llama.cpp provider.

ihales•2w ago

I was thinking about this too. Zed officially supports self-hosting Zeta, and so one option would be to create a proxy that uses the Zeta wire format, but is packed by llama.cpp (or any model backend). In the proxy you could configure prompts, context, templates, etc., while still using a production build of Zed. I'll give it a shot if I have time.

kevinlu1248•2w ago

Double-check if you're using the right format.

Example here: https://huggingface.co/sweepai/sweep-next-edit-1.5B/blob/mai...

kevinlu1248•2w ago

We'll push to Ollama

_mugencode•2w ago

Great! I have been trying to do something similar for Clojure.

This is a great resource to explore similar approach. https://blog.sweep.dev/posts/oss-next-edit

My notes so far https://kapilreddy.me/notes/2024/11/17/building-clojure-slm-...

notsylver•2w ago

I've been waiting for something like this for ages. Cursor making me pay $20/month when all I use from it is autocomplete was always a little annoying, especially as they changed the UI to push agents more and it got in the way. I was even considering doing it myself but wasn't sure about gambling on models small enough to run locally being smart enough to do anything useful.

I threw together a vscode extension to run it and while the extension is rough, the model seems decent. I'm trying to keep my expectations contained, in the past local models have been absolutely terrible for inline completion, this seems much better already. I hope this kicks off more competition.

kevinlu1248•2w ago

Let me know if you have any questions. We have a lot of harness code that cleans up many bad behaviours that makes it a lot more usable (like token healing: https://blog.sweep.dev/posts/token-healing-autocomplete).

dainiusse•2w ago

Do you have anything to share? Would be curious trying it out

zoobab•2w ago

Where is the training data?

We can't keep calling those models "open source" if we have a black box and know precisely how they were made.

"Open weights" are the new binary.

kevinlu1248•2w ago

Woops meant to say open-weight. We put open-weight in the title and but accidentally wrote open-source in the description.

bberenberg•2w ago

This seems great for code, but can this be used for non-code use cases?

kevinlu1248•2w ago

Yes, I've used it to write blog posts / large user-facing copy.

ajayarama•2w ago

This is actually a game changer. I’ve been meaning to want to run models to accomplish exactly this, but don’t have enough VRAM on my GPU for the conventional LLM-method for the most part. This seems to be a far more efficient method of accomplishing a more scoped problem. Thank you for making it open source!

kevinlu1248•2w ago

Let me know if you have any questions! What hardware are you on?

smusamashah•2w ago

Can this be used offline in Jetbrain IntelliJ? Looking at the plugin, it looks like it requires sign in and then it uses the cloud based model instead of the local one. Can't tell.

deepsquirrelnet•2w ago

This is really awesome detail. I’m very impressed by the amount of care taken to identify a good template. I started a small hook to try and do this using DSPy prompt optimizers, but haven’t had a compelling use case to try it with.

This seems like an ideal case for trying DFT as well. I’m not sure if you’re using trl, but I’d suggest checking that out.

kevinlu1248•2w ago

We're using an internal fork of trl for some of the steps.

pdyc•2w ago

I dont want to hand edit i want the output of better ai model with edit instructions like //update here <code> // new code <code> insert here etc. and local model read the files and apply the updates. I tried generating patch format but both bigger models fail to generate it accurately and smaller models have hard time in using them. Is there some way to do this with this kind of model? or its for completions while editing only?

cmrdporcupine•2w ago

I've been trying my hands at implementing an emacs package for inline completions with this. I have it mostly working and performance is good enough but I haven't been blown away by the quality of its suggestions unfortunately. Which I guess is expected from a 1.5B model.

I'd love to see them making a larger model in the 10-20b range maybe? I know most people wouldn't be able to run that on their machines, but some could.

Running on ollama locally on NVIDIA Spark GB10. Tried it also with vLLM. Pretty fast.

mijoharas•2w ago

Do you care to share your implementation?

cmrdporcupine•2w ago

If I can make it clean and decent I will. I might look at again after work and see if I can tune it up. It was a bit flake and I wasn't blown away by the interaction.

kevinlu1248•2w ago

Are you using the right format? https://huggingface.co/sweepai/sweep-next-edit-1.5B/blob/mai...

cmrdporcupine•2w ago

Yea, I tweaked it a bunch to try to follow what was described there

jrop•2w ago

Between GLM-4.7-Flash and this announcement, THIS is what I'm excited to see in this space: pushing the capabilities of _small_ models further and further. It really feels like we're breaking into a space where models that can run on hardware that I actually own is getting better and better, and that has me excited.

Semaphor•2w ago

At least for C#, the quality of the cloud offering is rather mediocre, so I don’t expect this model to be that useful there. It’s very overeager, suggesting tons of stuff that I never accepted because it made no sense. It’s also producing bad code, wanting me to use `.Result` for async calls instead of simply await-ing.

kevinlu1248•2w ago

It's a bit undertrained on C#, we'll continue improving on this!

k9294•2w ago

Is there an oss model for next word / edits predictions for texts in general? e.g. Typing emails?

dubesar55•2w ago

Has somebody built any vscode extensions for this? Also is anyone serving this model?

zekejohn•2d ago

Nice, could this be used to auto complete terminal/cli commands?

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: ARM64 Android Dev Kit

Show HN: MCP App to play backgammon with your LLM

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

Show HN: Horizons – OSS agent execution engine

Show HN: Compile-Time Vibe Coding

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: ARM64 Android Dev Kit

Show HN: MCP App to play backgammon with your LLM

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

Show HN: Horizons – OSS agent execution engine

Show HN: Compile-Time Vibe Coding

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete

Comments