Overall Zed is super nice and opposite of janky, but still found a few of defaults were off and Python support still was missing in a few key ways for my daily workflow.
Edit:
With the latest update to 0.185.15 it works perfectly smooth. Excellent addition to my setup.
I have also really appreciated something that felt much less janky, had better vim bindings, and wasn't slow to start even on a very fast computer. You can completely botch Cursor if you type really fast. On an older mid-range laptop, I ran into problems with a bunch of its auto-pair stuff of all things.
I don't think Zeta is quite up to windsurf's completion quality/speed.
I get that this would go against their business model, but maybe people would pay for this - it could in theory be the fastest completion since it would run locally.
We are living in a strange age that local is slower than the cloud. Due to the sheer amount of compute we need to do. Compute takes hundreds of milliseconds (if not seconds) on local hardware, making 100ms of network latency irrelevant.
Even for a 7B model your expensive Mac or 4090 can't beat, for example, a box with 8x A100s running FOSS serving stack (sglang) with TP=8, in latency.
* Cursor's Cmk-K edit-inline feature (with Claude 3.7 as my base model there) works brilliantly for "I just need this one line/method fixed/improved"
* Cursor's tab-complete (neé SuperMaven) is great and better than any other I've used.
* Cline w/ Gemini 2.5 is absolutely the best I've tried when it comes to full agentic workflow. I throw a paragraph of idea at it and it comes up with a totally workable and working plan & implementation
Fundamentally, and this may be my issue to get over and not actually real, I like that Cline is a bring-your-own-API-key system and an open source project, because their incentives are to generate the best prompt, max out the context, and get the best results (because everyone working on it wants it to work well). Cursor's incentive is to get you the best results....within their budget (of $.05 per request for the max models and within your monthly spend/usage allotment for the others). That means they're going to try to trim context or drop things or do other clever/fancy cost saving techniques for Cursor, Inc.. That's at odds with getting the best results, even if it only provides minor friction.
1. Follow up with Codex.
`mct "fix bad response on h2 server" --model anthropic/claude-3.7-sonnet:thinking`
Machtiani will stream the answer, then also apply git patches suggested in the convo automatically.
Then I could follow up with codex.
`codex "See unstaged git changes. Run tests to make sure it works and fix and problems with the changes if necessary."
2. Codex and MCT together
`codex "$(mct 'fix bad response on h2 server' --model deepseek/deepseek-r1 --mode answer-only)"`
In this case codex will dutifully implement the suggested changes of codex, saving tokens and time.
The key for the second example is `--mode answer-only`. Without this flagged argument, mct will itself try and apply patches. But in this case codex will do it as mct withholds the patches with the aforementioned flagged arg.
3. Refer codex to the chat.
Say you did this
`mct "fix bad response on h2 server" --model gpt-4o-mini --mode chat`
Here, I used `--mode chat`, which tells mct to stream the answer and save the chat convo, but not to apply git changes (differrent than --mode answer-only).
You'll see mct will printout that something like
`Response saved to .machtiani/chat/fix_bad_server_resonse.md`
Now you can just tell codex.
`codex "See .machtiani/chat/fix_bad_server_resonse.md, and do this or that...."`
*Conclusion*
The example concepts should cover day-to-day use cases. There are other exciting workflows, but I should really post a video on that. You could do anything with unix philosophy!
But mct leverages the weak models well, do things not possible otherwise. And it does even better with stronger models. Rewards stronger models, but doesn't punish smaller models.
So basically, you can use save money and do more using mct + codex. But I hear aider is terminal tool so maybe try and mct + aider?
I still use cursor chat with agent mode though, but I've always been indecisive. Like the others said though, its nice to see how cline behaves to assist with creating your own agentic workflows.
I have seen mentioning of this but is there actually a source to back it up? Tried Cline every now and then. While it's great, I don't find it better than Cursor (nor worse in any clear way)
It helps that the task is usually self-contained, but I guess as an engineer, it's kinda in your instinct to always divide and conquer any task.
But I daily drive Cursor because the main LLM feature I use is tab-complete, and here Cursor blows the competition out of the water. It understands what I want to do next about 95% of the time when I'm in the middle of something, including comprehensive multi-line/multi-file changes. Github Copilot, Zed, Windsurf, and Cody aren't at the same level imo.
Roo is less solid but better-integrated.
Hopefully I'll switch back soon.
It was easy to figure out exactly what it's sending to the LLM, and I like that it does one thing at a time. I want to babysit my LLMs and those "agentic" tools that go off and do dozens of things in a loop make me feel out of control.
For the occasional frontend task, I don’t mind being out of control when using agentic tools. I guess this is the origin of Karpathy’s vibe coding moniker: you surrender to the LLM’s coding decisions.
For backend tasks, which is my bread and butter, I certainly want to know what it’s sending to the LLM so it’s just easier to use the chat interface directly.
This way I am fully in control. I can cherry pick the good bits out of whatever the LLM suggests or redo my prompt to get better suggestions.
So this part of my workflow is intentionally fairly labor intensive because it involves lots of copy-pasting between my IDE and the chat interface in a browser.
just isn't true. If everything was equal, that might possibly be true, but it turns out that system prompts are quite powerful in influencing how an LLM behaves. ChatGPT with a blank user entered system prompt behaves differently (read: poorer at coding) than one with a tuned system prompt. Aider/Copilot/Windsurf/etc all have custom system prompts that make them more powerful rather than less, compared to using a raw web browser, and also don't involve the overhead of copy pasting.
Probably related to Sonnet 3.7’s rampant ADHD and less the CLI tool itself (and maybe a bit of LLMs-suck-at-Swift?)
I'm using VSC for most edits, tab-completion is done via Copilot, I don't use it that much though, as I find the prediction to be subpar or too wordy in case of commenting. I use Aider for rubber-ducking and implementing small to mid-scope changes. Normally, I add the required files, change to architect or ask mode (depends on the problem I want to solve), explain what my problem is and how I want it to be solved. If the Aider answer satisfies me, I change to coding mode and allow the changes.
No magic, I have no idea how a single prompt can generate $4. I wouldn't be surprised if I'm only scratching on the surface with my approach though, maybe there is a better but more costly strategy yielding better results which I just didn't realize yet.
I assume we have a very different workflow.
With deepseek: ~nothing.
My only problem was deepseek occasionally not answering at all, but generally it was fast (non thinking that was).
You your /tokens to see how many tokens it has in its context for the next request. You manage it by dropping files and clearing the context.
Also the --watch mode is the most productive interface of using your editor, no need of extra textboxes with robot faces.
Compared to Aider, Brokk
- Has a GUI (I know, tough sell for Aider users but it really does help when managing complex projects)
- Builds on a real static analysis engine so its equivalent to the repomap doesn't get hopelessly confused in large codebases
- Has extremely useful git integration (view git log, right click to capture context into the workspace)
- Is also OSS and supports BYOK
I'd love to hear what you think!
I get much better result by asking specific question to a model that has huge context (Gemini) and analyzing the generated code carefully. That's the opposite of the style of work you get with Cursor or Windsurf.
Is it less efficient? If you are paid by LoCs, sure. But for me the quality and long-term maintainability are far more important. And especially the Tab autocomplete feature was driving me nuts, being wrong roughly half of the time and basically just interrupting my flow.
So many of the bugs and poor results that it can introduce are simply due to improper context. When forcibly giving it the necessary context you can clearly see it’s not a model problem but it’s a problem with the approach of gathering disparate 100 line snippets at a time.
Also, it struggles with files over 800ish lines which is extremely annoying
We need some smart deepseek-like innovation in context gathering since the hardware and cost of tokens is the real bottleneck here.
To generate such files and then not be able to read them is pure stupidity.
Frustrates the hell out of me as someone who thinks at 300-400 lines generally you should start looking at breaking things up.
Now I'm testing Claude Code’s $100 Max plan. It feels like magic - editing code and fixing compile errors until it builds. The downside is I’m reviewing the code a lot less since I just let the agent run.
So far, I’ve only tried it on vibe coding game development, where every model I’ve tested struggles. It says “I rewrote X to be more robust and fixed the bug you mentioned,” yet the bug still remains.
I suspect it will work better for backend web development I do for work: write a failing unit test, then ask the agent to implement the feature and make the test pass.
Also, give Zed’s Edit Predictions a try. When refactoring, I often just keep hitting Tab to accept suggestions throughout the file.
Compared to Zed Agent, Claude Code is: - Better at editing files. Zed would sometimes return the file content in the chatbox instead of updating it. Zed Agent also inserted a new function in the middle of the existing function. - Better at running tests/compiling. Zed struggled with nix environment and I don't remember it going to the update code -> run code -> update code feedback loop.
With this you can leave Claude Code alone for a few minutes, check back and give additional instructions. With Zed Agent it was more of a constantly monitoring / copy pasting and manually verifying everything.
*I haven't tested many of the other tools mentioned here, this is mostly my experience with Zed and copy/pasting code to AI.
I plan to test other tools when my Claude Code subscription expires next month.
I've been building SO MANY small apps and web apps in the latest months, best $20/m ever spent.
Somehow other models don't work as well with it. ,,auto'' is the worst.
Still, I hate it when it deletes all my unit tests to ,,make them pass''
Or, my favorite: when you’ve been zeroing in on something actually interesting and it says at the last minute, “let’s simplify our approach”. It then proceeds to rip out all the code you’ve written for the last 15 minutes and insert a trivial simulacrum of the feature you’ve been working on that does 2% of what you originally specified.
$5 to anyone who can share a rules.md file that consistently guides Sonnet 3.7 to give up and hand back control when it has no idea what it’s doing, rather than churn hopelessly and begin slicing out nearby unrelated code like it’s trying to cut out margins around a melanoma.
Ideally, things like RooCode + Claude are much better, but you need infinite money glitch.
I built a minimal agentic framework (with editing capability) that works for a lot of my tasks with just seven tools: read, write, diff, browse, command, ask and think.
One thing I'm proud of is the ability to have it be more proactive in making changes and taking next action by just disabling the `ask` tool.
I won't say it is better than any of the VSCode forks, but it works for 70% of my tasks in an understandable manner. As for the remaining stuff, I can always use Cursor/Windsurf in a complementary manner.
It is open, have a look at https://github.com/aperoc/toolkami if it interests you.
There are a couple of neovim projects that allow this ... Advante come to mind right now.
I will say this: it is a different thought process to get an llm to write code for you. And right now, the biggest issue for me is the interface. It is wrong some how, my attention not being directed to the most important part of what is going on....
Same as in the crazy times of frontend libraries when it was a new one every week. Just don't jump on anything, and learn the winner in the end.
Sure, I may not be state of the art. But I can pick up whatever fast. Let someone else do all the experiments.
My reference for agent mode is Claude Code. It's far from perfect, but it uses sub-tasks and summarization using smaller haiku model. That feels way more like a coherent solution compared to Cursor. Also Aider ain't bad when you're OK with more manual process.
Windsurf: Have only used it briefly, but agent mode seems somewhat better thought out. For example, they present possible next steps as buttons. Some reviews say it's even more expensive than Cursor in agent mode.
That seems to be often better than using Cursor. I don't really understand why it calls tools when I selected entire file to be used a context - tool calls seem to be unnecessary distraction in this case, making calls more expensive. Also Gemini less neurotic when I use it with very basic prompts -- either Cursor prompts make it worse, or the need to juggle tool calls distract it from calls.
I expect, or hope for, more stability in the future, but so far, from aider to Copilot, to Claude Code, to Cursor/Windsurf/Augment, almost all of them improve (or at least change) fast and seem to borrow ideas from each other too, so any leader is temporary.
I’m thinking of building an AI IDE that helps engineers write production quality code quickly when working with AI. The core idea is to introduce a new kind of collaboration workflow.
You start with the same kind of prompt, like “I want to build this feature...”, but instead of the model making changes right away, it proposes an architecture for what it plans to do, shown from a bird’s-eye view in the 2D canvas.
You collaborate with the AI on this architecture to ensure everything is built the way you want. You’re setting up data flows, structure, and validation checks. Once you’re satisfied with the design, you hit play, and the model writes the code.
Website (in progress): https://skylinevision.ai
YC Video showing prototype that I just finished yesterday: https://www.youtube.com/watch?v=DXlHNJPQRtk
Karpathy’s post that talks about this: https://x.com/karpathy/status/1917920257257459899
Thoughts? Do you think this workflow has a chance of being adopted?
The only thing that I kept thinking about was - if there is a correction needed- you have to make it fully by hand. Find everything and map. However, if the first try was way off , I would like to enter from "midpoint" a correction that I want. So instead of fixing 50%, I would be left with maybe 10 or 20. Don't know if you get what I mean.
Eventually, you’d say, ‘add an additional layer, TopicsController, between those two files,’ and the local model would do it quickly without a problem, since it doesn’t involve complicated code generation. You’d only use powerful remote models at the end.
PS. I’m stealing the ‘antidote to “vibe coding”’ phrase :)
Windsurf I think has more features, but I find it slower compared to others.
Cursor is pretty fast, and I like how it automatically suggests completion even when moving my cursor to a line of code. (Unlike others where you need to 'trigger' it by typing a text first)
Honorable mention: Supermaven. It was the first and fastest AI autocomplete I used. But it's no longer updated since they were acquired by Cursor.
There are already a few popular open-source extension doing 90%+ of what Cursor is doing - Cline, Roo Code (a fork of Cline), Kilo Code (a fork of Roo Code and something I help maintain).
If you're on Arch, there's even an AUR package, so it's even less steps than that.
AppImages aren't sandboxed and they can access the rest of the system just fine. After all, they're just a regular SquashFS directory that get mounted into a /tmp mount and then executed from there.
I’ve personally never felt at home in vscode. If you’re open to switching, definitely check out Zed, as others are suggesting.
Ie. it's not a "plugin" but built-in ecosystem developed by core team.
Speed of iterations on new features is quite impressive.
Their latest agentic editing update basically brought claude code cli to the editor.
Most corporations don't have direct access to arbitrary LLMs but through Microsoft's Github's Copilot they do – and you can use models through copilot and other providers like Ollama – which is great for work.
With their expertise (team behind pioneering tech like electron, atom, teletype, tree sitter, building their own gpu based cross platform ui etc.) and velocity it seems that they're positioned to outpace competition.
Personally I'd say that their tech is maybe two orders of magnitude more valuable than windsurf?
It's not only that they have it built in but it seems to be currently the best open replacement for tools like claude code cli because you can use arbitrary llm with it, ie. from ollama and you have great extension points (mcp servers, rules, slash commands etc).
- The same spec is processed by the same LLM differently when implementing from scratch. This can maybe mitigated somewhat by adjusting the temperature slider. But generally speaking, the same spec won't give the same result unless you are very specific.
- Same if you use different LLMs. The same spec can give entirely different results for different LLMs.
- This can probably mitigated somewhat by getting more specific in the spec, but at some point, it is so specific as being the code itself. Unless of course you don't care that much about the details. But if you don't, you get a slightly different app every time you implement from scratch.
- Gemini 2.5 pro has "reasoning" capabilities and introduces a lot of "thinking" tokens into the context. Let's say you start with a single line spec and iterate from there. Gemini will give you a more detailed spec based on its thinking process. But if you then take the new thinking-process spec as a new starting point for the next iteration of the spec, you get even more thinking. In short, the spec gets automatically expanded by the way of "thinking" with reasoning models.
- Produced code can have small bugs, but they are not really worth to put in the spec, because they are an implementation detail.
I'll keep experimenting with it, but I don't think this is the holy grail of AI assisted coding.
The crazy part is my Vim setup has the Codeium plugins all still in place, and it works perfectly. I’m afraid if I update the plugin to a windsurf variant, it will completely “forget” about Puppet, its syntax, and everything it has “learned” from my daily workflow over the last couple years.
Has anyone else seen anything similar?
When we're managing 10-20 AI coding agents to get work done, the interface for each is going to need to be minimal. A lot of cursor's functionality is going to be vestigial at that point, as a tool it only makes sense as a gap-bridger for people that are still attached to manual coding.
Cursor works roughly how I've expected. It reads files and either gets it right or wrong in agent mode.
Windsurf seems restricted to reading files 50 lines at a time, and often will stop after 200 lines [0]. When dealing with existing code I've been getting poorer results than Cursor.
As to autocomplete: perhaps I haven't set up either properly (for PHP) but the autocomplete in both is good for pattern matching changes I make, and terrible for anything that require knowledge of what methods an object has, the parameters a method takes etc. They both hallucinate wildly, and so I end up doing bits of editing in Cursor/Windsurf and having the same project open in PhpStorm and making use of its intellisense.
I'm coming to the end of both trials and the AI isn't adding enough over Jetbrains PhpStorm's built in features, so I'm going back to that until I figure out how to reduce hallucinations.
0. https://www.reddit.com/r/Codeium/comments/1hsn1xw/report_fro...
> Which LLM are you?
> I am Claude, an AI assistant created by Anthropic. In this interface, I'm operating as "Junie," a helpful assistant designed to explore codebases and answer questions about projects. I'm built on Anthropic's large language model technology, specifically the Claude model family.
Jetbrains wider AI tools let you choose the model that gets used but as far as I can tell Junie doesn't. That said, it works great.
Also, while Windsurf has more project awareness, and it's better at integrating things across files, actually trying to get it to read enough context to do so intelligently is like pulling teeth. Presumably this is a resource-saving measure but it often ends up taking more tokens when it needs to be redone.
Overall Cursor 'just works' better IME. They both have free trials though so there's little reason not to try both and make a decision yourself. Also, Windsurf's pricing is lower (and they have a generous free tier) so if you're on a tight budget it's a good option.
Cursor autocomplete stops working after trial ends.
The flat pricing of Claude Code seems tempting, but it's probably still cheaper for me to go with usage pricing. I feel like loading my Anthropic account with the minimum of $5 each time would last me 2-3 days depending on usage. Some days it wouldn't last even a day.
I'll probably give Open AI's Codex a try soon, and also circle back to Aider after not using it for a few months.
I don't know if I misundersand something with Cursor or Copilot. It seems so much easier to use Claude Code than Cursor, as Claude Code has many more tools for figuring things out. Cursor also required me to add files to the context, which I thought it should 'figure out' on its own.
Cursor can find files on its own. But if you point it in the right direction it has far better results than Claude code.
Do they publish any benchmark sheet on how it compares against others?
It went through multiple stages of upgrades and I would say at this stage it is better than copilot. Fundamentally it is as good as cursor or windsurf but lacks some features and cannot match their speed of release. If you re on aws tho its a compelling offering.
This represents one group of developers and is certainly valid for that group. To each their own
For another group, where I belong, AI is a great companion! We can handle the noise and development speed is improved as well as the overall experience.
I prefer VSCode and GitHub copilot. My opinion is this combo will eventually eat all the rest, but that's besides the point.
Agent mode could be faster, sometimes it is rather slow thinking but not a big deal. This mode is all I use these days. Integration with the code base is a huge part of the great experience
I've created a list of self-hostable alternatives to cursor that I try to keep updated. https://selfhostedworld.com/alternative/cursor/
AI is not useful when it does the thinking for you. It's just advanced snippets at that point. I only use LLMs to explain things or to clarify a topic that doesn't make sense right away to me. That's when it shows it's real strength.
sing AI for autocomplete? I turn it off.
Copilot at work and Junie at home. I found nothing about my VSCode excursions to be better than Sublime or IntelliJ.
BTW There's a new OSS competitor in town that got the front a couple of days ago - Void: Open-source Cursor alternative https://news.ycombinator.com/item?id=43927926
I can’t really explain or prove it, but it was noticeable enough to me that I canceled my subscription and left Windsurf
Maybe a prompting or setting issue? Too high temperature?
Nowadays Copilot got good enough for me that it became my daily driver. I also like that I can use my Copilot subscription in different places like Zed, Aider, Xcode
Cline seems to be the best thing, I suspect because it doesn't do any dirty tricks with trimming down the context under the hood to keep the costs down. But for the same reason, it's not exactly fun to watch the token/$ counter ticking as it works.
Plus its less about the actual code generation and more about how to use it effectively. I wrote a simple piece on how I use it to automate the boring parts of dev work to great effect https://quickthoughts.ca/posts/automate-smarter-maximizing-r...)
https://github.com/features/copilot/plans?cft=copilot_li.fea...
Cursor, Windsurf et al have no "moat" (in startup speak), in that a sufficiently resourced organization (e.g. Microsoft) can just copy anything they do well.
VS code/Copilot has millions of users, cursor etc have hundreds of thousands of users. Google claims to have "hundreds of millions" of users but we can be pretty sure that they are quoting numbers for their search product.
Haven't tried out Cursor / Windsurf yet, but I can see how I can adapt Claude Desktop to specifically my workflow with a custom MCP server.
Cursor is very lazy about looking beyond the current context or even context at all sometimes it feels it’s trying to one shot a guess without looking deeper.
Bad thing about Windsurf is the plans are pretty limited and the unlimited “cascade base” feels dumb the times I used it so ultimately I use Cursor until I hit a wall then switch to Windsurf.
I've tried VScode with copilot a couple of times and its frustrating, you have to point out individual files for edits but project wide requests are a pain.
My only pain is the workflow for developing mobile apps where I have to switch back and forth between Android Studio and Xcode as vscode extensions for mobile are not so good
My best experience so far is v0.dev :)
- We are in public beta and free for now.
- Fully Agentic. Controllable and Transparent. Agent does all the work, but keeps you in the loop. You can take back control anytime and guide it.
- Not an IDE, so don't compete with VSCode forks. Interface is just a chatbox.
- More like Replit - but full stack focussed. You can build backend services.
- Videos are up at youtube.com/@nonbios
On something like a M4 Macbook Pro can local models replace the connection to OpenAi/Anthropic?
Have a 3950X w/ 32GB ram, Radeon VII & 6900XT sitting in the closet hosting smaller models then a 5800X3D/128GB/7900XTX as my main machine.
Most any quantized model that fits in half of the vram of a single gpu (and ideally supports flash attention, optionally speculative decoding) will give you far faster autocompletes. This is especially the case with the Radeon VII thanks to the memory bandwidth.
https://blog.steelph0enix.dev/posts/llama-cpp-guide/#quantiz...
And I get fast enough autcomplete results for it to be useful. I have and NVIDIA 4060 RTX in a laptop with 8 gigs of dedicated memory that I use for it. I still use claude for chat (pair programming) though, and I don't really use agents.
I dont like CLI based tools to code. Dont understand why they are being shilled. Claude code is maybe better at coding from scratch because it is only raw power and eating tokens like there is no tomorrow but it us the wrong interface to build anything serious.
I like to strike a balance between coding from scratch and using AI.
If I am continuously able to break down my work into smaller pieces and build a tight testing loop, it does help me be more productive.
I can't say anything about Windsurf (as I haven't tried yet) but I can confidently say Cursor is great.
I can't use Cursor because I don't use Ubuntu which is what their Linux packages are compiled against and they don't run on my non-Ubuntu distro of choice.
The agents are a bit beta, it can’t solve bugs very often, and will write a load of garbage if you let it.
Gemini 2.5 + Claude 3.7 work very well
It's a matter of time before they're shuttered or their experience gets far worse.
Anything that’s not boilerplate I still code it
Cursor/Windsurf/et. al. are pointless middlemen.
If you are using VScode, get familiar with cline. Aider is also excellent if you don’t want to modify your IDE.
Additionally, Jetbrains IDEs now also have built-in local LLMs and their auto-complete is actually fast and decent. They also have added a new chat sidepanel in recent update.
The goal is NOT to change your workflow or dev env, but to integrate these tools into your existing flow, despite what the narrative says.
I had always wanted to get comfortable with Vim, but it never seemed worth the time commitment, especially with how much I’ve been using AI tools since 2021 when Copilot went into beta. But recently I became so frustrated by Cursor’s bugs and tab completion performance regressions that I disabled completions, and started checking out alternatives.
This particular combination of plugins has done a nice job of mostly replicating the Cursor functionality I used routinely. Some areas are more pleasant to use, some are a bit worse, but it’s nice overall. And I mostly get to use my own API keys and control the prompts and when things change.
I still need to try out Zed’s new features, but I’ve been enjoying daily driving this setup a lot.
Getting great results both in chat, edit and now agentic mode. Don’t have to worry about any blocked extensions in the cat and mouse game with MS.
I think people who ask the "either or" question are missing the point. We're supposed to use all the AI tools, not one or two of them.
and they just released agentic editing.
Early access waitlist -> ampcode.com
Is this something wildly different to Cody, your existing solution, or just a "subtle" attempt to gain more customers?
you should also ask if people acutally used both :)
All this IDE churn makes me glad to have settled on Emacs a decade ago. I have adopted LLMs into my workflow via the excellent gptel, which stays out of my way but is there when I need it. I couldn't imagine switching to another editor because of some fancy LLM integration I have no control over. I have tried Cursor and VS Codium with extensions, and wasn't impressed. I'd rather use an "inferior" editor that's going to continue to work exactly how I want 50 years from now.
Emacs and Vim are editors for a lifetime. Very few software projects have that longevity and reliability. If a tool is instrumental to the work that you do, those features should be your highest priority. Not whether it works well with the latest tech trends.
Fortunately, alien space magic seems immune, so far at least. I assume they do not like the taste, and no wonder.
right now its dumb unix piping only
I want an AI that can use emacs or vim with me
I would take care. Emacs has no internal boundaries by design and it comes with the ability to access files and execute commands on remote systems using your configured SSH credentials. Handing the keys to an enthusiastically helpy and somewhat cracked robot might prove so bad an idea you barely even have time to put your feet up on the dash before you go sailing through the windshield.
Which is exactly why it hasn’t been commercially developed.
I was exploring using andyk/ht discussed on hn a few months back, to sit as a proxy my llm can call at the same time i control via xtermjs, but i need to figure out how to train the llm to output keybindings/special keys etc, but promising start nonetheless, i can indeed parse a lot of extra info than just a command, just imagine if AI could use all of the shell auto-complete features but feed into it..
maybe i should revisit/cleanup that repo and make it public. It feels like with just some data training on special key bindings etc an llm should be able to type, even if -char by char- at a faster speed than a human, to control TUI's
Sure, you might not like it and think you as a human should write all code, but frequent experience in the industry in the past months is that productivity in the teams using tools like this has greatly increased.
It is not unreasonable to think that someone deciding not to use tools like this will not be competitive in the market in the near future.
I was converting a bash script to Bun/TypeScript the other day. I was doing it the way I am used to… working on one file at a time, only bringing in the AI when helpful, reviewing every diff, and staying in overall control.
Out of curiosity, threw the whole task over to Gemini 2.5Pro in agentic mode, and it was able to refine to a working solution. The point I’m trying to make here is that it uses MCP to interact with the TS compiler and linters in order to automatically iterate until it has eliminated all errors and warnings. The MCP integrations go further, as I am able to use tools like Console Ninja to give the model visibility into the contents of any data structure at any line of code at runtime too. The combination of these makes me think that TypeScript and the tooling available is particularly suitable for agentic LLM assisted development.
Quite unsettling times, and I suppose it’s natural to feel disconcerted about how our roles will become different, and how we will participate in the development process. The only thing I’m absolutely sure about is that these things won’t be uninvented with the genie going back in the bottle.
Sometimes it auto-completes nonsense, but sometimes I think I'm about to tab on auto-completing a method like FooABC and it actually completes it to FoodACD, both return the same type but are completely wrong.
I have to really be paying attention to catch it selecting the wrong one. I really really hate this. When it works its great, but every day I'm closer to just turning it off out of frustration.
A lot of people are against change because it endangers their routine, way of working, livelihood, which might be a normal reaction. But as accountants switched to using calculators and Excel sheets, we will also switch to new tools.
Where is this 2x, 10x or even 1.5x increase in output? I don't see more products, more features, less bugs or anything related to that since this "AI revolution".
I keep seeing this being repeated ad nauseam without any real backing of hard evidence. It's all copium.
Surely if everyone is so much more productive, a single person startup is now equivalent to 1 + X right?
Please enlighten me as I'm very eager to see this impact in the real world.
On the short term. Have fun debugging that mess in a year while your customers are yelling at you! I'll be available for hire to fix the mess you made which you clearly don't have the capability to understand :-)
Additionally, what you are failing to realise is that not everyone is just vibe coding and accepting blindly what the LLM is suggesting and deploying it to prod. There are actually people with decade+ of experience who do use these tools and who found it to be an accelerator in many areas, from writing boilerplate code, to assisting with styling changes.
In any case, thanks for the heads up, definitely will not be hiring you with that snarky attitude. Your assumption that I have no capability to understand something without any context tells more about you than me, and unfortunately there is no AI to assist you with that.
I don’t think the point was “don’t use LLM tools”. I read the argument here as about the best way to integrate these tools into your workflow.
Similar to the parent, I find interfacing with a chat window sufficiently productive and prefer that to autocomplete, which is just too noisy for me.
But coding agents can indeed save some time writing well-defined code and be of great help when debugging. But then again, when they don't work on a first prompt, I would likely just write the thing in Vim myself instead of trying to convince the agent.
My point being: I find agent coding quite helpful really, if you don't go overzealous with it.
I simply cannot see how I can tell an agent to implement anything I have to do in a real day job unless it's a feature so simple I could do it in a few minutes. Even those the AI will likely screw it up since it sucks at dealing with existing code, best practices, library versions, etc.
Or if I'm working on a full stack feature, and I need some boilerplate to process a new endpoint or new resource type on the frontend, I have the AI build the api call that's similar to the other calls and process the data while I work on business logic in the backend. Then when I'm done, the frontend API call is mostly set up already
I found this works rather well, because it's a list of things in my head that are "todo, in progress" but parallelizable so I can easily verify what its doing
Old fashioned variable name / function name auto complete is not affected.
I considered a small macropad to enable / disable with a status light - but honestly don't do enough work to justify avoiding work by finding / building / configuring / rebuilding such a solution. If the future is this sort of extreme autocomplete in everything I do on a computer, I would probably go to the effort.
The thing that bugs me is when Im trying to use tab to indent with spaces, but I get a suggestion instead.
I tried to disable caps lock, then remap tab to caps lock, but no joy
Any library that breaks backwards compatibility in major version releases will likely befuddle these models. That's why I have seen them pin dependencies to older versions, and more egregiously, default to using the same stack to generate any basic frontend code. This ignores innovations and improvements made in other frameworks.
For example, in Typescript there is now a new(ish) validation library call arktype. Gemini 2.5 pro straight up produces garbage code for this. The type generation function accepts an object/value. But gemini pro keeps insisting that it consumes a type.
So Gemini defines an optional property as `a?: string` which is similar to what you see in Typescript. But this will fail in arktype, because it needs it input as `'a?': 'string'`. Asking gemini to check again is a waste of time, and you will need enough familiarity with JS/TS to understand the error and move ahead.
Forcing development into an AI friendly paradigm seems to me a regressive move that will curb innovation in return for boosts in junior/1x engineer productivity.
I'm sure it's initially slower than vibe-coding the whole thing, but at least I end up with a maintainable code base, and I know how it works and how to extend it in the future.
I said this in another comment but I'll repeat the question: where are these 2x, 10x or even 1.5x increases in output? I don't see more products, more features, less bugs or anything related to that since this "AI revolution".
I keep seeing this being repeated ad nauseam without any real backing of hard evidence.
If this was true and every developer had even a measly 30% increase in productivity, it would be like a team of 10 is now 13. The amount of code being produced would be substantially more and as a result we should see an absolute boom in new... everything.
New startups, new products, new features, bugs fixed and so much more. But I see absolutely nothing but more bullshit startups that use APIs to talk to these models with a few instructions.
Please someone show me how I'm wrong because I'd absolutely love to magically become way more productive.
I am not a professional SWE; I am not fluent in C or Rust or bash (or even Typescript) and I don't use Emacs as my editor or tmux in the terminal;
I am just a nerdy product guy who knows enough to code dangerously. I run my own small business and the software that I've written powers the entire business (and our website).
I have probably gotten a AT LEAST a 500-1000% speedup in my personal software productivity over the past year that I've really leaned into using Claude/Gemini (amazing that GPT isn't on that list anymore, but that's another topic...) I am able to spec out new features and get them live in production in hours vs. days and for bigger stuff, days vs weeks (or even months). It has changed the pace and way in which I'm able to build stuff. I literally wrote an entire image editing workflow to go from RAW camera shot to fully processed product image on our ecommerce store that's cut out actual, real, dozens of hours of time spent previously.
Is the code I'm producting perfect? Absolutely not. Do I have 100% test coverage? Nope. Would it pass muster if I were a software engineer at Google? Probably not.
Is it working, getting to production faster, and helping my business perform better and insanely more efficiently? Absolutely.
If I want to, let's say, create some code in a language I never worked on an LLM will definitely make me more "productive" by spewing out code for me way faster than I could write it. Same if I try to quickly learn about a topic I'm not familiar with. Especially if you don't care about the quality, maintainability, etc. too much.
But if I'm already a software developer with 15 years of experience dealing with technology I use every day, it's not going to increase my productivity in any meaningful way.
This is the dissonance I see with AI talk here. If you're not a software developer the things LLMs enable you to do are game-changers. But if you are a good software developer, in its best days it's a smarter autocomplete, a rubber-duck substitute (when you can't talk to a smart person) or a mildly faster google search that can be very inaccurate.
If you go from 0 to 1 that's literally infinitely better but if you go from 100 to 105, it's barely noticeable. Maybe everyone with these absurd productivity gains are all coming from zero or very little knowledge but for someone that's been past that point I can't believe these claims.
Cursor, Windsurf, etc tend to feel like code vomit that takes more time to sift through than working through code by myself.
Are you getting irrelevant suggestions as those autocompletes are meant to predict the things you are about to type.
1) Stops me overthinking the solution 2)Being able to ask it pros and cons of different solutions 3) multi-x speedup means less worry about throwing away a solution/code I don't like and rewriting / refactoring 4) Really good at completing certain kinds of "boilerplate-y" code 5) Removed need to know the specific language implementation but rather the principle (for example pointers, structs, types, mutexes, generics, etc). My go to rule now is that I won't use it if I'm not familiar with the principle, and not the language implementation of that item 6) Absolute beast when it comes to debugging simple to medium complexity bugs
I just noticed CLion moved to a community license, so I re-installed it and set up Copilot integration.
It's really noisy and somehow the same binding (tab complete) for built in autocomplete "collides" with LLM suggestions (with varying latency). It's totally unusable in this state; you'll attempt to populate a single local variable or something and end up with 12 lines of unrelated code.
I've had much better success with VSCode in this area, but the complete suggestions via LLM in either are usually pretty poor; not sure if it's related to the model choice differing for auto complete or what, but it's not very useful and often distracting, although it looks cool.
I have largely disabled it now, which is a shame, because there are also times it feels like magic and I can see how it could be a massive productivity lever if it needed a tighter confidence threshold to kick in.
But I found once it was optional I hardly ever used it.
I use Deepseek or others as a conversation partner or rubber duck, but I'm perfectly happy writing all my code myself.
Maybe this approach needs a trendy name to counter the "vibe coding" hype.
Went back to VSCode with a tuned down Copilot and use the chat or inline prompt for generating specific bits of code.
All that to say that the base of your argument is still correct: AI really isn't saving all that much time since everyone has to proof-read it so much in order to not increase the number of PR bugs from using it in the first place.
I always forget syntax for things like ssh port forwarding. Now just describe it at the shell:
$ ssh (take my local port 80 and forward it to 8080 on the machine betsy) user@betsy
or maybe:
$ ffmpeg -ss 0:10:00 -i somevideo.mp4 -t 1:00 (speed it up 2x) out.webm
I press ctrl+x x and it will replace the english with a suggested command. It's been a total game changer for git, jq, rsync, ffmpeg, regex..
For more involved stuff there's screen-query: Confusing crashes, strange terminal errors, weird config scripts, it allows a joint investigation whereas aider and friends just feels like I'm asking AI to fuck around.
For extradata it sends uname and the procname when it captures such as "nvim" or "ipython" and that's it.
git cloen blahalalhah
I did a ctrl+x x and it fixed it. I'm using openrouter/google/gemma-3-27b-it:free via chutes. Not a frontier model in the slightest.
AI autocomplete is a feature, not a product (to paraphrase SJ)
I can understand Windsurf getting the valuation as they had their own Codeium model
$B for a VSCode fork? Lol
For example, VS Code has Cline & Kilo Code (disclaimer: I help maintain Kilo).
Jetbrains has Junie, Zencoder, etc.
Cursor and Windsurf are both good, but do what most people do and use Cursor for a month to start with.
I use Cursor and I like it a lot.