I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times. I think may be doing the same to me too.
Personally, and quietly, I have a major concern about the conflict of interest of Cursor deciding which files to add to context then charging you for the size of the context.
As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
Obviously I could just better review my own code, but that’s proving easier said than done to the point where I’m considering going back to vanilla Code.
I think about coding assistants like this as well. When I'm "ahead of the code," I know what I intend to write, why I'm writing it that way, etc. I have an intimate knowledge of both the problem space and the solution space I'm working in. But when I use a coding assistant, I feel like I'm "behind the code" - the same feeling I get when I'm reviewing a PR. I may understand the problem space pretty well, but I have to basically pick up the pieced of the solution presented to me, turn them over a bunch, try to identify why the solution is shaped this way, if it actually solves the problem, if it has any issues large or small, etc.
It's an entirely different way of thinking, and one where I'm a lot less confident of the actual output. It's definitely less engaging, and so I feel like I'm way less "in tune" with the solution, and so less certain that the problem is solved, completely, and without issues. And because it's less engaging, it takes more effort to work like this, and I get tired quicker, and get tempted to just give up and accept the suggestions without proper review.
I feel like these tools were built without any sort of analysis if they _were_ actually an improvement on the software development process as a whole. It was just assumed they must be, since they seemed to make the coding part much quicker.
It also doesn't help reviewing such code that sometimes surprisingly complex problems are solved correctly, while there's surprisingly easy parts that can be subtly (or very) wrong.
A hard pill to swallow is that a lot of software developers have spent most of their careers "behind the code" instead of out ahead of it. They're stuck for years in an endless "Junior Engineer" cycle of: try, compile, run, fix, try, compile, run, fix--over and over with no real understanding, no deliberate and intentional coding, no intimacy, no vision of what's going on in the silicon. AI coding is just going to keep us locked into this inferior cycle.
All it seems to help with is letting us produce A Lot Of Code very quickly. But producing code is 10% of building a wonderful software product....
Makes you look more efficient but it doesn't make you more effective. At best you're just taking extra time to verify the LLM didn't make shit up, often by... well, looking at the docs or the source.. which is what you'd do writing hand-crafted code lol.
I'm switching back to emacs and looking at other ways I can integrate AI capabilities without losing my mental acuity.
aw yeah; recently I spent half a day pulling my hair debugging some cursor-generated frontend code just to find out the issue was buried in some... obscure experimental CSS properties which broke a default button behavior across all major browsers (not even making this up).
Velocity goes up because you produce _so much code so quickly_, most of which seems to be working; managers are happy, developers are happy, people picking up the slack - not so much.
I obviously use LLMs to some extent during daily work, but going full-on blind mode on autopilot gotta crash the ship at some point.
I mean, I generally avoid using mocks in tests for that exact reason, but if you expect your AI completions to always be wrong you wouldn't use them in the first place.
Beyond that, the tab completion is sometimes too eager and gets in the way of actually editing, and is particularly painful when writing up a README where it will keep suggesting completely irrelevant things. It's not for me.
Yea, this is super annoying. The tab button was already overloaded between built-in intellisense stuff and actually wanting to insert tabs/spaces, now there are 3 things competing for it.
I'll often just want to insert a tab, and end up with some random hallucination getting inserted somewhere else in the file.
But still there is too much noise now. I don't look at the screen while I'm typing so that I'm not bombarded by this eager AI trying to distract me with guesses. It's like a little kid interrupting all the time.
Also telling it not to code, or not to jump to solutions is important. If there's a file outlining how you like to approach different kinds of things, it can take it into consideration more intuitively. Takes some practice to pay attention to your internal dialogue.
https://x.com/lxeagle17/status/1899979401460371758
Students are not asking questions anymore.
Small assignments work well!
But then the big test comes and scores are at an all time low.
BUT (apologies for the caps), back in the day we didn't have calculators and now we do. And perhaps the next phase in academia is "solve this problem with the LLM of your choice - you can only use the free versions of LLAMA vX.Y, ChatGTP vA.B, etc. - no paid subscriptions allowed" (in the same spirit that for some exams you can use the simple calculator and not the scientific one). Because if they don't do it, they (academia/universities) will lose/bleed out even more credibility/students/participation.
The world is changing. Some 'parts' are lagging. Academia is 5 years behind (companies paying them for projects help though), Politicians are 10-15 years behind (because the 'donors' (aka bribers) prefer a wild-wild-west for a few years before rights are protected. (Case and point writers/actors applying a lot of pressure when they realized that politicians won't do anything until cornered)
LLMs are replacing thinking and for students the need to even know the basics. From the perspective of an academic program if they're stopping the students learning the materials they're actively harmful.
If you're saying that LLMs obviate the need to understand the basics I think that's dangerously wrong. You still need a human in the loop capable of understanding whether the output is good or bad.
I have a feeling there will be a serious shortage of strong "old-school" seniors in a few years. If students and juniors are reliant on AI, and we recruit seniors from our juniors, who will companies turn to when AI gets stuck?
Sadly, I don't think companies are going to hire graybeards to maintain AI slop code. They're just going to release low-quality garbage, and the bar for what counts as good software will get lower and lower.
It is like they will be the 'new nerds' and we will have the 'street-smarts'.
Same as StackOverflow, same as Google, same as Wikipedia for students.
The problem is not using the tools, it's what you do with the result. There will always be lazy people who will just use this result and not think anymore about it. And then there will always be people who will use those results as a springboard to what to look for in the documentation of whatever tool / language they just discovered thanks to Cursor.
You want to hire from the second category of people.
That is to say, as we strive for better and better tools along a single axis, at some point the social dynamics shift markedly, even though it’s just “more of the same”. Digital distribution was just better distribution, but it changed the very nature of journalism, music, and others. Writing on computers changed writing. And the list goes on.
“This is just the next thing in a long line of things” is how we technologists escape the disquieting notion that we are running more and more wild social experiments.
Is it though?
If I used stack overflow, for example, I still needed to understand the code well enough to translate it to my specific codebase, changing variable names at the very least.
Good prompting _does_ require engagement and, for most cases, some research, just like SO or Google.
Sure, you can throw out idle or lazy queries. The results will be worse generally.
> As with so many products, it's cheap to start with, you become dependent on it, then one day it's not cheap and you're fucked.
If it gets too expensive, then I guess the alternative becomes using something like Continue.dev or Cline with one of the providers like Scaleway that you can rent GPUs from or that have managed inference… either that, or having a pair of L4 cards in a closet somewhere (or a fancy Mac, or anything else with a decent amount of memory).
Whereas if there are no well priced options anywhere (e.g. the upfront investment for a company to buy their own GPUs to run with Ollama or something else), then that just means that running LLM based systems nowadays is economically infeasible for many.
Can you elaborate on what you're referring to? I don't used Cursor extensively, but I do pay for it, and it was a flat fee annual subscription for unlimited requests, with the "fast" queue being capped at a set number of requests per month with no reference to their size.
Claude Code does work the way you say, since you provide it your Anthropic API key. But I have not seen Cursor charging for context or completion tokens, which is actually the reason why I'm paying for it instead of using something like Aider.
Often I will decompose the problem into smaller subproblems and feed those to cursor one by one slowly building up the solution. That works for big tickets.
For me the time saving and force multiplier isn't necessarily in the problem solving, I can do that faster and better in most cases, but the raw act of writing code? It does that way faster than me.
The key I find is experience doing that kind of stuff to begin with, vs domain experience as well, vs little to none and wanting to learn the ropes.
Exactly!
As you learn a codebase, you and it become better together.
It can take a while, but the investment in understanding the stack you are working with does pay off.
Trying to shortcut junior devs to productivity is the plot to a cyberpunk horror movie!
Is Cursor ultimately not using GPT-4, GPT-4o, Claude 3.5/3.7 Sonnet and so on? If some of the auto completion and agent features might be nice, is this not too much push for what is essentially just another UI plugin?
Creating a good user interface, processing the code into appropriate embeddings for useful matching, and marketing are just three things that take a lot of effort to get right.
The fragmented landscape of alternatives is not attractive at all.
One thing I'm quite worried about, though, is that Cursor is a fork of VS Code. This will never be maintainable in the long run, in particular if Microsoft does not want them to continue.
Copilot is unlimited for a flat fee [1]. I've been happy with it.
They also have a free tier now, but it's limited to 50 chats a month, which I'd burn through pretty quickly.
1: https://docs.github.com/en/copilot/about-github-copilot/subs...
You dont outsource your thinking to the tool, You do the thinking and let the tool type it for you.
You said just the same in another of your posts:
> if you can begin to describe the function well
So I have to learn how to describe code rather than just writing it as I've done for years?
Like I said, it's good for the small stuff, but bad for the big stuff, for now at least.
If we keep going down this path, we might end up inventing artificial languages for the purpose of precisely and unambiguously describing a program to a computer.
Its been around for decades. In fact this was the first approach to doing AI.
In logic programming you basically write concrete set of test cases and the compiler generates the code for which the test cases hold 'true'.
In other words you get a language to 'precisely and unambiguously describe a program', as you said. Compiler writes the code for you.
You ask it to give you one block at a time.
iterate over the above list and remove all strings matching 'apple'
open file and write the above list etc etc kind of stuff.
Notice how the English here can be interpreted only way, but the LLM is now a good intelligent coding assistant.
>>I think in code, I'd rather just write in code.
Continue to think, just make the LLM type out the outcome of your ideas.
Experienced developers develop fluency in their tools, such that writing such narrow natural language directives like you suggest is grossly regressive. Often, the tasks don't even go through our head in English like that, they simply flow from the fingers onto the screen in the language of our code, libraries, and architecture. Are you familiar with that experience?
What you suggest is precisely like a fluently bilingual person, who can already speak and think in beautiful, articulate, and idiomatic French, opting to converse to their Parisian friend through an English-French translation app instead of speaking to them directly.
When applied carefully, that technique might help someone who wants to learn French get more and deeper exposure than without a translation app, as they pay earnest attention to the input and output going through their app.
And that technique surely helps someone who never expects to learn French navigate their way through the conversations they need to have in a sufficient way, perhaps even opening new and eventful doors for them.
But it's an absolutely absurd technique for people whose fluency is such that there's no "translating" happening in the first place.
You can see that right?
>>You can see that right?
I get it, but this as big a paradigm shift as much as Google and Internet was to people in the 90s. Some times how you do things changes, and that paradigm becomes too big of a phenomenon to neglect. Thats where we are now.
You have to understand sometimes a trend or a phenomenon is so large that fighting it is pointless and some what resembles luddite behaviour. Not moving on with time is how you age out and get fired. Im not talking about a normal layoff, but more like becoming totally irrelevant to whatever that is happening in the industry at large.
But in this thread, it sounds like you're trying to suggest we're already there or close to it, but when you get into the details, you (inadvertently?) admitted that we're still a long ways off.
The narrow-if-common examples you cited slow experienced people down rather than speed them up. They surely make some simple tasks more accessible to inexperienced people, just like in the translation app example, and there's value in that, but it represents a curious flux at the edges of the industry -- akin VBA or early PHP -- rather than a revolutionary one at the center.
It's impactful, but still quite far from a paradigm shift.
Some quite large organisations have had all-hands meetings and told their developers that they must use LLM support and 'produce more', we'll see what comes of it. Unlike you I consider it to be a bad thing when management and owners undermine workers through technology and control (or discipline, or whatever we ought to call the current fashion), i.e. the luddites were right and it's not a bad thing to be a luddite.
Perhaps the written word just doesn’t describe the phenomenon well. Do you have any goto videos that show no -toy examples of pairing with an ai that you think illustrate your point well?
Especially if you’re fluent in an editor like Vim, Emacs, Sublime, have setup intellisense and snippets, know the language really well and are very familiar with the codebase.
But those had obvious benefits that made the learning cost sensible. “I can describe in precise English, a for loop and have the computer write it” flat sounds backwards.
> iterate over the above list and remove all strings matching 'apple'
> open file and write the above list etc etc kind of stuff.
Honestly, I can write the code faster to do those things than I can write the natural language equivalent into a prompt (in my favored language). I doubt I could have gotten there without actually learning by doing, though.This feels an awful lot like back in the day when we were required to get graphing calculators for math class (calculus, iirc), but weren't allowed to get the TI-92 line that also had equation solvers. If you had access to those, you'd cripple your own ability to actually do it by hand, and you'd never learn.
Then again, I also started programming with notepad++ for a few years and didn't really get into using "proper" editors until after I'd built up a decent facility for translating mind-to-code, which at the time was already becoming a road less travelled.
Thats pretty fast.
If you are telling me I ask it to write the entire 20,000 line script in one go, that's not how I think. Or how I go about approaching anything in my life, let alone code.
To go a far distance, I go in cycles of small distances, and I go a lot of them.
How many lines of natural language do you have to write in order to get it to generate these 50-100 lines correctly?
I find that by the time I have written the right prompt to get a decently accurate 100 lines of code from these tools, which I then hage to carefully review, I could have easily written the 100 lines of code myself
That doesn't make the tool very useful, especially because reviewing the code it generates is much slower than writing it myself
Not to mention the fact that even if it generates perfect bug free code (which I want to emphasize: it never ever seems to do) if I want to extend it I still have to read it thoroughly, understand it, and build my own mental model of the code structure
I think many people need to do some maintenance work on really old projects.
"One block at a time", I'd like to see the LLM which will find for me the CRON script doing some database work hidden in a forgotten VM.
Recently I've tried to make them figure out an algorithm that can chug through a list of strings and collect certain ones, grouping lines with one pattern under the last one with another pattern in a new list. They consistently fail, and not in ways that are obvious at a glance. Fixing the code manually takes longer than just writing the code.
Usually it compiles and runs, but does the wrong thing. Sometimes they screw up recursion and only collect one string. Sometimes they add code for collecting the strings that are supposed to be grouped but don't use it, and as of yet it's consistently wrong.
They also insist on generating obtuse and wrong regex, sometimes mixing PCRE with some other scheme in the same expression, unless I make threats. I'm not sure how other people manage to synthesise code they feel good about, maybe that's something only the remote models can do that I for sure won't send my commercial projects to.
As someone old enough to have built websites in Notepad.exe it's totally reasonable that I ask my teams to turn off syntax highlighting, brace matching, and refactoring tools in VSCode. I didn't have them when I started, so they shouldn't use them today. Modern IDE features are just making them lazy.
/s
Change comes with pros and cons. The pros need to outweigh the cons (and probably significantly so) for change to be considered progress.
Syntax highlighting has the pro of making code faster to visually parse for most people at the expense of some CPU cycles and a 10 second setting change for people for whom color variations are problematic. It doesn't take anything away. It's purely additive.
AI code generation tools provide a dubious boost to short term productivity at the expense of extra work in the medium term and skill atrophy in the long term.
My junior developers think I don't know they are using AI coding tools. I discovered it about 2 months into them doing it, and I've been tracking their productivity both before and after. In one case, one might be committing to the repository slightly more frequently. But in all cases, they still aren't completing assignments on time. Or at all. Even basic things have to be rewritten because they aren't suitable for purpose. And in our pair programming sessions, I see them frozen up now, where they weren't before they started using the tools. I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
I tried to use AI code generation once to fill in some ASP.NET Core boilerplate for setting up authentication. Should be basic stuff. Should be 3 or 4 lines of code. I've done it before, but I forgot the exact lines and had been told AI was good for this kind of lazy recall of common tasks. It gave me a stub that had a comment inside, "implement authentication here". Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation. And it still wasn't done. I haven't touched AI code gen since.
So IDK. I'm very skeptical of the claims that AI is writing significant amounts of working code for people, or that it at all rivals even a moderately smart junior developer (say nothing of actually experienced senior). I think what's really happening is that people are spending a lot of time spinning the roulette wheel, always betting on 00, and then crowing they're a genius when it finally lands.
Most people are using it to finish work soon, rather than use it to do more work. As a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products.
>>I can even see them habitually attempt to use the AI, but then remember I'm sitting with them, and halt.
I understand you and I grew up in a different era. But life getting easier for the young isnt exactly something we must resent. Things are only getting easier with time and have been like this for a few centuries. None of this is wrong.
>>Tried to coax the AI into doing what I wanted and easily spent 10x more time than it would have taken to look up the documentation.
Honestly this largely reads like how my dad would describe technology from the 2000s. It was always that he was better off without it. Whether that was true or false is up for debate, but the world was moving on.
I think you just hit the core point that splits people in these discussions.
For many senior engineers, we see our jobs are to build better and more lasting products. Correctness, robustness, maintainability, consistency, clarity, efficiency, extensibility, adaptability. We're trying to build things that best serve our users, outperform our competition, enable effective maintenance, and include the design foresight that lets our efforts turn on a dime when conditions change while maintaining all these other benefits.
I have never considered myself striving towards "newer and bigger" projects and I don't think any of the people I choose to work with would be able to say that either. What kind of goal is that? At best, that sounds like the prototyping effort of a confused startup that's desperately looking to catch a wave it might ride, and at worst it sounds like spam.
I assure you, little of the software that you appreciate in your life has been built by senior engineers with that vision. It might have had some people involved at some stage who pushed for it, because that sort of vision can effectively kick a struggling project out of a local minimum (albeit sometimes to worse places), but it's unlikely to have been a seasoned "senior engineer" being the one making that push and (if they were) they surely weren't wearing that particular hat in doing so.
One can use ai AND build stable products at the same time. These are not exactly opposing goals, and even above that assuming that ai will always generate bad code itself is wrong.
Very likely people will build both stable and large products using ai than ever before.
I understand and empathise with you, moving on is hard, especially when these kind of huge paradigm changing events arrive, especially when you are no longer in the upswing of life. But the arguments you are making are very similar to those made by boomers about desktops, internet and even mobile phones. People have argued endlessly how the old way was better, but things only get better with newer technology that automates more things than ever before.
I completely agree with you that "one can use ai AND build stable products at the same time", even in the context of the conversation we're having in the other reply chain.
But I think we greatly disagree about having encountered a "paradigm changing event" yet. As you can see throughout the comments here, many senior engineers recognize the tools we've seen so far for what they are, they've explored their capabilities, and they've come to understand where they fit into the work they do. And they're simply not compelling for many of us yet. They don't work for the problems we'd need them to work for yet, and are often found to be clumsy and anti-productive for the problems they can address.
It's cute and dramatic to talk about "moving on is hard" and "luddism" and some emotional reaction to a big scary immanent threat, but you're mostly talking to exceedingly practical and often very-lazy people who are always looking for tools to make their work more effective. Broadly, we're open to and even excited about tools that could be revolutionary and paradigm changing and many of us even spend our days trying to discover build those tools. A more accurate read of what they're saying in these conversations is that we're disappointed with these and in many cases and just find that they don't nearly deliver on their promise yet.
I don't see how AI is making life easier for my developers. You seem to have missed the point that I have seen no external sign of them being more productive. I don't care if they feel more productive. The end result is they aren't. But it does seem to be making life harder for them because they can't seem to think for themselves anymore.
> a senior engineer your job must not be to stop the use of LLMs, but create opportunities to build newer and bigger products
Well then, we're in agreement. I should reveal to my juniors that I know they are using AI and that they should stop immediately.
Of course not. But, eventually these young people are going to take over the systems that were built and understood by their ancestors. History shows what damage can be caused to a system when people who don't fully understand and appreciate how it was built take it over. We have to prepare them with the necessary knowledge to take over the future, which includes all the warts and shit piles.
I mean, we've spent a lot of our careers trying to dig ourselves out of these shit piles because they suck so bad, but we never got rid of them, we just hid them behind some polish. But it's all still there, and vibe coders aren't going to be equipped to deal with it.
Maybe the hope is that AI will reach god-like status and jut fix all of this for us magically one day, but that sounds like fixing social policy by waiting for the rapture, so we have to do something else to assure the future.
At the moment, sure. They've only been available for about 5 minutes in the grand scheme of dev tools. If you believe that AI assistants are going to be put back in the box then you are just flat out wrong. They'll improve significantly.
I'm very skeptical of the claims that AI is writing significant amounts of working code for people
You may be right, but people write far too much code as it is. Software development should be about thinking more than typing. Maybe AI's most useful feature will be writing something that's slightly wrong in order to get devs to think about a good solution to their problem and then they can just fix the AI's code. If that results in better software then it's a huge win worth billions of dollars.
The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
I am extremely skeptical that LLM-based generative AI running in silicon-based digital computers will improve to a significant degree over what we have today. Ever.
GPT-2 to GPT-3 was a sea change improvement, but every since then, new models are really only incrementally improving, despite taking exponentially more power and compute to train. Coupled with the fact that processor are only getting wider, not faster, or less energy consuming, then without an extreme change in computing technology, we aren't getting there with LLMs.
Either the processors or the underlying AI tech need to change, and there is no evidence this is the case.
> The belief that AI is worthless unless it's writing the code that a good dev would write is a trap that you should avoid.
I have no idea what you're even trying to say but this. Is this some kind of technoreligion that thinks AGI is worth the endeavor regardless of the harm that comes to people along the way?
The dev tooling has gotten better; I use the integrated copilot every day and it saves me from writing a lot of boilerplate.
But it's not really replacing me as a coder. Yeah I can go further faster. I can fill in gaps in knowledge. Mostly, I don't have to spend hours on forums and stack overflow anymore trying to work out an issue. But it's not replacing me because I still have to make fine-grained decisions and corrections along the way.
To use an analogy, it's a car but not a self-driving one -- it augments my natural ability to super-human levels; but it's not autonomous, I still have to steer it quite a lot or else it'll run into oncoming traffic. So like a Tesla.
And like you I don't see how to get there from where we are. I think we're at a local maxima here.
And like you I don't see how to get there from where we are. I think we're at a local maxima here.
To continue the car analogy - are you really suggesting we're at 'peak car'? You don't believe that cars in 20 years time are going to be significantly better than the cars we have today? That's very pessimistic.
Thinking back to the car I had 20 years ago, it's not all that different from the car I have now.
Yes, the car I have now has a HUD, Carplay, Wireless iPhone charging, an iPhone app, adaptive cruise control, and can act as a wifi hotspot. But fundamentally it still does the same thing in the same way as 20 years ago. Even if we allow for EVs and Hybrid cars, it's still all mostly the same. Prius came out in 2000.
And now we've reached the point where computers advance like cars. We're writing code in the same languages, the same OS, the same instruction set, for the same chips as we did 20 years ago. Yes, we have new advancements like Rust, and new OSes like Android and iOS, and chipsets like ARM are big now. But iPhone/iPad/iMac, c/C++/Swift, OSX/MacOS/iOS, PowerPC/Intel/ARM.... fundamentally it's all homeomorphic - the same thing in different shapes. You take a programmer from the 70s and they will not be so out of place today. I feel like I'm channeling Bret Victor here: https://www.youtube.com/watch?v=gbHZNRda08o
And that's not for lack of advancements in languages, Os, instruction sets, and hardware architectures, it's for a lack of investment and commercialization. I can get infinite money right now to make another bullshit AI app, but no one wants to invest in an OS play. You'll hear 10000 excuses about how MS this and Linux that and it's not practical and impossible and there's no money in it, so on and so forth. The network effects are too high, and the in-group dynamic of keeping things the way they are is too strong.
But AGI? Now that's something investors find totally rational and logical and right around the corner. "I will make a fleet of robot taxis that can drive themselves with a camera" will get you access to unlimited wallets filled with endless cash. "I will advance operating systems past where they have been languishing for 40 years" is tumbleweeds.
Another great analogy. LLMs allow us to pick low hanging fruit faster. If we want to pick the higher fruit, we'll need fundamentally different equipment, not automated ways to pick low hanging fruit.
I wanted to make an observation that what you two are describing seems to me like it maps onto the Pareto principle quite neatly
LLMs seem like they have rapidly (maybe exponentially) approached the 80% effectiveness threshold, but the last 20% is going to be a much higher bar to clear
I think a lot of the disagreement around how useful these tools are is based around this. You can tell which people are happy with 80% accuracy versus those with higher standards
Who does that? If I can’t find something within 15 minutes on the web, it’s back to reading specs, docs, and code. Or bringing out the debugger.
But the same idea applies to specs/docs -- instead of reading the specs or the docs, I'd rather be talking to a LLM trained on the specs and docs.
But in defence of those spending hours on forums, sometimes a project is not well documented, and the code isn't easy to read. In those situations though, I'll be browsing their Github issues or contacting them directly.
Often though I find it highly valuable to go read the docs, even if an LLM has given me a working example. Sometimes, I find better ways, warnings, or even information on unrelated things I want to do.
I'm saying there is a lot of value in tools that are merely 'better than the status quo'. An AI assistant doesn't need to be as good as a dev in order to be useful.
They turn you into the assistant instead of the programmer in the driver seat
Whenever you use them, you have to shift "code review mode", which is not the role of the primary programmer on a task, it is the role of a secondary programmer doing a PR
It's "a lot of value" if you like being the assistant to a very inconsistent junior programmer
If your team can't recognize bullshit without syntax highlighting, they're going to struggle even when it's turned on. Once they've mastered the underlying principles, syntax highlighting will make them much more effective.
Even the writers of Star Trek, a fictional fantasy show with near-magic super AI, understood that the Engineers would eventually have to fix something the AI couldn't, without the AI helping them
All teaching works this way. Would it be totally reasonable to have junior pilots only practice with autopilot?
It’s just human nature, you can decide if it’s funny or sad or whatever
Hiding behind the sarcasm tag to take the piss out of people younger than you, I don't think that's very funny. The magnetised needle and a steady hand gag from xkcd, now that is actually funny.
If you are using LLMs to write anything more than a if(else)/for block at a time you are doing it wrong.
>>I think it's weakening junior engineers' reasoning and coding abilities as they become reliant on it without having lived for long, or at all, in the before times.
When I first started work, my employer didn't provide internet access to employees, their argument would always be how would you code if there was no internet connection, out there in the real world? , As it turns out they were not only worried about the wrong problem, but the got the whole paradigm about this new world wrong.
In short it was not worth building anything at all in a world internet doesn't exist.
>>then one day it's not cheap ...
Again you are worried about the wrong thing, your worry should not be what happens when its no longer cheaper, but when it, as a matter of fact gets cheaper. Which it will.
Then what value are they actually adding?
If this is all they are capable of, surely you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
Why are people so bullish on them?
Albeit they are fairly context aware as to what you are asking. So they can save a lot of RTFM and code/test cycles. At times they can look at the functions that are already built, and write new ones for you, if you can begin to describe the function well.
But if you want to write a good function, like written to fit tightly to specifications. Its too much English. You need to describe in steps what is to be done, plus exceptions. And at some point you are just doing logic programming(https://en.wikipedia.org/wiki/Logic_programming) In the sense that whole english text looks like a list of and/or situations + exceptions.
So you have to go one atomic step(a decision statement and a loop) at a time. But thats a big productivity boost too. Reason being able to put lots of text in place without you having to manually type it out.
>>you could just write this code yourself much faster than trying to describe what you want to the LLM in natural language?
Honestly speaking most of coding is manually laborious if you don't know touch typing. And even if you did know its a chore.
I remember when I started using co-pilot with react it was doing a lot of otherwise typing work I'd have to do.
>>I cannot imagine any decently competent programmer gaining productivity from these tools if this is how limited they still are
IMO opinion, my brain atleast over the years has seen so many code patterns, debugging situations and what to anticipate and assemble as I go, that having some intelligent typing assistant is a major productivity boost.
>>Why are people so bullish on them?
Eventually newer programming languages will come along and people will build larger things.
It's terrible when you get to complex code, but I'd rather spend most of my time there anyways
For example, I like LLMs because they take care of a lot of the boilerplate I have to write.
But I only have to write that boilerplate because it's part of the language design. Advances in syntax and programming systems can yield similar speedups in programming ability. I've seen a 100x boost in productivity that came down to switching to a DSL versus C++.
Maybe we need more DSLs, better programming systems, better debugging tools, and we don't really need LLMs the way LLM makers are telling us? LLMs only seem so great because our computer architecture, languages and dev tooling and hardware are stuck in the past.
Instead of being happy with the Von Neumann architecture, we should be exploring highly parallel computer architectures.
Instead of being happy with imperative languages, we should be investing heavily in studying other programming systems and new paradigms.
Instead of being happy coding in a 1D text buffer, we should be investing more in completely imaginative ways of building programs in AR, VR, 3D, 2D.
LLMs are going to play a part here, but I think really they are a band-aid to a larger problem, which is that we've climbed too high in one particular direction (von-neuman/imperative/text) and we are at a local maxima. We've been there since 2019 maybe.
There are many other promising peaks to climb, avenues of research that were discovered in the 60s/70s/80s/90s have been left to atrophy the past 30 years as the people who were investigating those paths refocused or are now gone.
I think all these billions invested in AI are going to vaporize, and maybe then investors will focus back on the fundamentals.
LLMs are like the antenna at the top of the Empire State Building. Yes, you can keep going up if you climb up there, but it's unstable and eventually there really is a hard limit.
If we want to go higher that that, we need to build a wider and deeper foundation first.
Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.
If you prune out context from the initial prompt, instead of reasoning on richer context, the llm reasons only on the prompt itself (w/ no access to the attached files). After the thinking process, Cursor runs function calls to retrieve more context, which entirely defeats the point of "thinking" and induces the model to create incoherent plans and speculative edits in its thinking process, thus explaining Claude's bizarre over-editing behavior. I suspect this is why so many Cursor users are complaining about Claude 3.7.
On top of this, Cursor has every incentive to keep the thinking effort for both o3-mini and Claude 3.7 to the very minimum so as to reduce server load.
Cursor is being hailed as one of the greatest SAAS growth stories but their $20/mo all-you-can-eat business model puts them in such a bad place.
In general I feel like this was always the reason automatic context detection could not be good in fixed fee subscription models - providers need to constrain the context to stay profitable. I also saw that things like Claude Code happily chew through your codebase, and bank account, since they are charging by token - so they have the opposite incentive.
Keep in mind that what we call "reasoning" models today are the first iteration. There's no fundamental reason why you can't do what you stated. It's not done now, but it can be done.
There's nothing stoping you from running "tinking" in "chunks" of 1-2 paragraphs, doing some search, and adding more context (maybe from pre-reasoned cache) and continuing the reasoning from there.
There's also work being done on think - summarise - think - summarise - etc. And on various "RAG"-like thinking.
Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.
There's nothing about this that conflicts with reasoning models, I'm not sure what you mean here.
edit: Ah, I see what you mean now.
Software like Claude Code and Cline do not face those constraints, as the cost burden is on the user.
You can also use cline with gemini-2.0-flash, which supports a huge context window. Cline will send it the full context and not prune via RAG, which helps.
Or put another way, isn't the promise of software that is capable to generate any software given a natural language description in finite time basically assuming P=NP? Because unless the time can be guaranteed to be finite, throwing GPU farms and memory at this most general problem (isn't the promise of using software to generating arbitrary software the same as the promise that any possible problem can be solved in polynomial time?) is not guaranteed to solve it in finite time.
Of course then you disrespected me with a rude ad hominem and got a rude response back. Ignoring the point and attacking the persin is a concession. M
For the record, I and many others use throwaways wvery single thread. This isn't and shouldn't be reddit.
Peak HN.
But you can at least resell that $10k Mac Studio, theoretically.
Lets say a cluster of raspberry pi's / low powered devices producing results as good as claude 3.7-sonnet. Would it be completely infeasible to create a custom model that is trained on your own code base and might not be a fully fledged LLM but provides similar features to cursor?
Have we all gone bonkers sending our code to third parties? The code is the thing you want to keep secret unless your working on an open source project.
I played with aider a few days ago. Pretty frustrating experience. It kept telling me to "add files" that are in the damn directory that I opened it in. "Add them yourself" was my response. Didn't work; it couldn't do it somehow. Probably once you dial that in, it starts working better. But I had a rough time with it creating commits with broken code, not picking up manual file changes, etc. It all felt a bit flaky and brittle. Half the problem seems to be simple cache coherence issues and me having to tell it things that it should be figuring out by itself.
The model quality seems less important than the plumbing to get the full context to the AI. And since large context windows are expensive, a lot of these tools are cutting corners all the time.
I think that's a short term problem. Not cutting those corners is valuable enough that a logical end state is tools that don't do that that cost a bit more. Just load the whole project. Yes it will make every question cost 2-3$ or something like that. That's expensive now but if it drops by 20x we won't care.
Basically large models that support huge context windows of millions/tens of millions of tokens cost something like the price of a small car and use a lot of energy. That's OK. Lots of people own small cars. Because they are kind of useful. AIs that have a complete, detailed context of all your code, requirements, intentions, etc. will be able to do a much better job that one that has to guess all of that from a few lines of text. That would be useful. And valuable to a lot of people.
Nvidia is rich because they have insane margins on their GPUs. They cost a fraction of what they sell them for. That means that price will crash over time. So, I'm optimistic that a lot of this stuff will improve rapidly.
That's intentional, and I like it. It limits the context dynamically to what is necessary (of course it makes mistakes). You can also add files with placeholders and in a number of other ways. but most of the time I let Aider decide. It has a repomap (https://aider.chat/docs/repomap.html), gradually building up knowledge and makes proposals based on this and other information it gathered also with token costs and out-of-context-window in mind.
As for manual changes: aider is opinionated regarding the role of Git in your workflow. At first glance, this repels some people and some stick to this opinion. For others, it is exactly one of the advantages, especially in combination with the shell-like nature of the tool. But the standard Git handling can still be overridden. For me personally, the default behavior becomes more and more smooth and second nature. And the whole thing is scriptable, I only begin to use the possibilities.
In general: Tools have to be learned, impatient one-shot attempts are simply not enough anymore.
OTOH currently the LLM companies are probably taking a financial loss with each token. Wouldn't be surprised if the price doesn't even cover the electricity used in some cases.
Also e.g. Gemini already runs on Google's custom hardware, skipping the Nvidia tax.
That still leaves us with an ungodly amount of resources used both to build the GPUs and to run them for a few years before having to replace them with even more GPUs.
Its pretty amazing to me how quickly the big tech companies pivoted from making promises to "go green" to buying as many GPUs as possible to burn through entire powerplants worth of electricity.
"Learn when a problem is best solved manually."
Sure, but how? This is like the vacuous advice for investors: buy low and sell high
But I came to this conclusion by first letting it try to do everything and observing where it fell down.
You’re also not limited to a single tool. You can switch to different tools and even have multiple editors open at the same time.
Once the codebase is reasonably structured, it's much better at picking which files it needs to read in.
I'm surprised that this sort of pattern - you fix a bug and the AI undoes your fix - is common enough for the author to call it out. I would have assumed the model wouldn't be aggressively editing existing working code like that.
I guess that while code that compiles is easier to train for but code with warnings less so?
I remember there are other examples of changes that I have to tell the AI I made to not have it change it back again, but can't remember any specific examples.
I think the changelog said they fixed it in 0.46, but that’s clearly not the case.
I've stopped using agent unless its for a POC where I just want to test an assumption. Applying each step takes a bit more time but means less rogue behaviour and better long term results IME.
Relatable though.
In my test application I had a service which checked the cache, then asked the repository if no data is in cache, then uses external APIs to fetch some data, combine it and update the DB and the cache.
I asked Cursor to change using DateTime type to using Unix timestamp. It did the changes but it also removed cache checks and calling external APIs, so my web app relied just on the data in DB. When asked to add back what it removed, it broke functionality in other parts of the application.
And that is with a small simple app.
What worked for me is having it generate functions, classes, ranging from tens of lines of code to low hundreds. That way I could quickly interate on its output and check if its actually what I wanted.
It created a prompt-check-prompt iterative workflow where I could make progress quite fast and be reasonably certain of getting what I wanted. Sometimes it required fiddling with manually including files in the context, but that was a sacrifice I was willing to make and if I messed up, I could quickly try again.
With these agentic workflows, and thinking models I'm at a loss.
To take advantage of them, you need very long and detailed prompts, they take a long time to generate and drop huge chunks of code on your head. What it generates is usually wrong due to the combination of sloppy or ambiguous requirements by me, model weaknesses, and agent issues. So I need to take a good chunk of time to actually understand what it made, and fix it.
The iteration time is longer, I have less control over what it's doing, which means I spend many minutes of crafting elaborate prompts, reading the convoluted and large output, figuring out what's wrong with it, either fixing it by hand, or modifying my prompt, rinse and repeat.
TLDR: Agents and reasoning models generate 10x as much code, that you have to spend 10x time reviewing and 10x as much time crafting a good prompt.
In theory it would come out as a wash, in practice, it's worse since the super-productive tight AI iteration cycle is gone.
Overall I haven't found these thinking models to be that good for coding, other than the initial project setup and scaffolding.
I work on one file at a time in Ask mode, not Composer/Agent. Review every change, and insist on revisions for anything that seems off. Stay in control of the process, and write manually whenever it would be quicker. I won’t accept code I don’t understand, so when exploring new domains I’ll go back with as many questions as necessary to get into the details.
I think Cursor started off this way as a productivity tool for developers, but a lot of Composer/Agent features were added along the way as it became very popular with Vibe Coders. There are inherent risks with non-coders copypasting a load of code they don’t understand, so I see this use case as okay for disposable software, or perhaps UI concept prototypes. But for things that matter and need to be maintained, I think your approach is spot on.
[0] https://forum.cursor.com/t/environment-secrets-and-code-secu...
I can see from that thread that the approach hasn’t been perfect, but it seems that the last two releases have tried to address that :
“0.46.x : .cursorignore now blocks files from being added in chat or sent up for tab completions, in addition to ignoring them from indexing.”
Some VSCode extensions don't work, you need to redo all your configuration, add all your workspaces... and the gain vs Copilot is not that high
Have you programmed extensions for VSCode before? While it seems like a fairy extensible system overall, the editor component in particular is very restrictive. You can add text (that's what extensions like ErrorLens and GitLens are doing), inlay hints, and on-hover popup overlays (those can only trigger on words, and not on punctuation). What Cursor does: the automatic diff-like views of AI suggestions with graphic outlines, floating buttons, and whatnot right on top of the text editing view - is not possible in vanilla VSCode.
This was originally driven by necessity of tighter control over editor performance. In its early days VSCode was competing with Atom - another extensible JS-powered editor from GitHub, and while Atom had an early lead due to larger extensions catalog VSCode ultimately won the race because they manged to maintain lower latency of their text editor component. Nowadays they still don't want to introduce extra extension points to it, because newer faster editors pop out all the time, too.
I think that's (at least part of) your answer. More friction to move back from an entirely separate app rather than disabling an extension.
Just sharing this because I think some might find it useful.
https://github.com/hibernatus-hacker/ai-hedgehog
This is a simple code assistant that doesn't get in your way and makes sure you are coding (not losing your ability to program).
You configure a replicate API token from replicate... install the tool and point it at your code base.
When you save a file it asks the LLM for advise and feedback on the file as a "senior developer".
Run this along side your favorite editor to get feedback from an LLM as your working on (open source code nothing you don't want third parties to see).
You are still programming and using your brain but you have some feedback when you save files.
The feedback is less computationally expensive or fraught with difficulty than actually getting code from LLM's so it should work with much less powerful models.
It would be nice if there was a search built in so it could search for useful documentation for you.
I’m going to try it tomorrow!
As things move from prototype to production ready the productivity starts to become a wash for me.
AI doesn’t do a good job organizing the code and keeping it DRY. Then it’s not easy for it to make those refactorings later. AI is good at writing code that isn’t inherently defective but if there is complexity in the code it will introduce bugs in its changes.
I use Continue for small additions and tab completions and Claude for large changes. The tab completions are a small productivity boost.
Nice to see these tips- I will start experimenting with prompts to produce better code.
- Push for DRY principles ("make code concise," "ensure good design").
- Swap models strategically; sometimes it's beneficial to design with one model and implement with another. For example, use DeepSeek R1 for planning and Claude 3.5 (or 3.7) for execution. GPT-4.5 excels at solving complex problems that other models struggle with, but it's expensive. - Insist on proper typing; clear, well-typed code improves autocompletion and static analysis.
- Certain models, particularly Claude 3.7, overly favor nested conditionals and defensive programming. They frequently introduce nullable arguments or union types unnecessarily. To mitigate this, keep function signatures as simple and clean as possible, and validate inputs once at the entry point rather than repeatedly in deeper layers.
- Emphasize proper exception handling. Some models (again, notably Claude 3.7) have a habit of wrapping everything in extensive try/catch blocks, resulting in nested and hard-to-debug code reminiscent of legacy JavaScript, where undefined values silently pass through multiple abstraction layers. Allowing code to fail explicitly is a blessing for debugging purposes; masking errors is like replacing a fuse with a nail.
In my experience, the gap between Claude 3.7 and GPT-4.5 is substantial. Claude 3.7 behaves like an overzealous intern on stimulants. It delivers results but often includes unwanted code changes, resulting in spaghetti code with deeply nested conditionals and redundant null checks. Although initial results might appear functional, the resulting technical debt makes subsequent modifications increasingly difficult, often leaving the codebase in disarray. GPT-4.5 behaves more like a mid-level developer, thoughtfully applying good programming patterns.
Unfortunately, the cost difference is significant. For practical purposes, I typically combine models. GPT-4.5 is generally reserved for planning, complex bug fixes, and code refinement or refactoring.
In my experience, GPT-4.5 consistently outperforms thinking models like o1. Occasionally, I'll use o3-mini or DeepSeek R1, but GPT-4.5 tends to be noticeably superior (at least, on average). Of course, effectiveness depends heavily on prompts and specific problems. GPT-4.5 often possesses direct knowledge about particular libraries (even without web searching), whereas o3-mini frequently struggles without additional context.
Sometimes I could solve in 15 mins, a bug I had been chasing for days. In other cases, it is simpler to write codes by hand - as AI either does not solve a problem (even a simple one), or does, but at a cost of tech debt - or it takes longer than doing things manually.
AI is just one more tool in our arsenal. It is up to us to decide when to use them. Just because we have a hammer does not mean we need to use it for screws.
> Wouldn’t it be easier instead of juggling with [something] and their quirks to just write the code the old way?
This phrase, when taken religiously, would keep us writing purely in assembly - as there is always "why this new language", "why this framework", "why LLMs".
- LLM keeps forgetting/omitting parts of the code
- LLM keeps changing unrelated parts of the code
- LLM does not output correctly typed code (with Rust this can feel like throwing mud at a wall and see what sticks, in the end you're faster on your own)
- LLM flip-flops back and forth between two equally wrong answers when asked about a particularly (from the perspective of the LLM) to answer problem
In the end the main thing any AI coding tool will have to solve, is how to get the human in front of the LLM to trust that the output does what it does without breaking other things.
But of course LLMs are already crazy good at whst they do. I just wonder how people who have no idea what they are doing will be able to handle that power.
another huge thing for me has been to scaffold a complex feature just to see what it would do. just start out with literal garbage and an idea and as long as it works you can start to see if something is going to pan out or not. then tear it down and do it again with those new assumptions you learned. keep doing it until you have a clear direction.
or sometimes my brain just needs to take a break and i'll work on boilerplate stuff that i've been meaning to do or small refactors.
I've been using Windsurf since it was released, and back then, it was so ahead of Cursor it's not even funny. Windsurf feels like it's trained on good programming practices (check usage of the function in other parts of the project for consistency, double checking for errors after changes made, etc). It's also surprisingly fast (it can "search" the 5k files codebase in, like, 2 seconds. It even asked me once to copy and paste output from Chrome DevTools because it suspected that my interpretation of the result was not accurate (and it was right).
The only thing I truly wish is to have the same experience with locally running models. Perhaps Mac Studio 512GB will deliver :)
I asked it to refactor an authenticatedfetch block of code. It went on a loop exhausting 15 credits (https://bsky.app/profile/jjude.com/post/3ljuhrxs3442k).
When I code with with AI assistance; I "think" differently and noticed that I have more memory bandwidth to think about the big picture rather than the details.
With AI assistance, I can keep the entire program logic in my head; otherwise I have to do expensive context switching between the main components of the program/system.
How are you "thinking" when typing prompts vs typing actual code?
By asking the AI to generate a context.md file, you get an automatically structured overview of the project, including its purpose, file organization, and key components. This makes it easier to onboard new contributors, including other LLMs.
Natural for Cursor to nudge users towards their paid plans, but why provide the ability to use your own API keys in the first place if you're going to make them useless later?
Rules: Allow nested .cursor/rules directories and improved UX to make it clearer when rules are being applied.
This has made things a lot easier in my monorepos.
What in the hell were they thinking?!
I'm really shocked, actually. This might push me to look at competitors.
* It has terrible support for Elixir (my fav language) because the models are only really trained on python.
* Terrible clunky interface... it would be nice if you didn't have to click around, do modifier ctrl + Y stuff ALL the time.
* The code generated is still riddled with errors or naff (apart from boiler plate)... so I am still * prompt engineering * the crap out of it.. which I'm good at but I can prompt engineer using phind.com...
* The fact that the code is largely broken first time and they still haven't really fixed the context window problem means you have to copy paste error codes back into it.. defeating the purpose of an in integrated IDE imo.
* The free demo mode stops working after generating one function... if I had been given more time to evaluate it fully I would never have signed up. I signed up to see if it was any good.. which it isn't.
Why not just use an editor that is focused on coding, and then just not use an LLM at all? Less fighting the tooling, more getting your job done with less long term landmines.
There are a lot of editors, and many of them even have native or semi-native LLM support now. Pick one.
Edit: Also, side note, why are so many people running their LLMs in the cloud? All the cutting edge models are open weight licensed, and run locally. You don't need to depend on some corporation that will inevitably rug-pull you.
Like, a 7900XTX runs you about $1000. You probably already own a GPU that cost more in your gaming rig.
???
Deepseek R1 doesn't run locally unless you program on a dual socket server with 1 TB of RAM. Or enough cash to have a cabinet of GPUs. The trend for state-of-the-art LLMs is to get bigger over time, not smaller.
Look, I've played with llava and llama locally too, but the benchmarked performance is nowhere near what you can get from the larger cloud providers who can serve hundred-million+ parameter models without quantization.
Also, performance between cloud-ran models and models I've ran locally with llama.cpp seem to be actually pretty similar. Are you sure your model didn't fit into your VRAM, or something else may have been misconfigured? Not fitting into VRAM slows everything to a halt. All the coder models that are worth looking at fit into 24GB cards in their full sized variants with the right quantization.
There are also fused models such as https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Code... that also seem to perform interestingly.
You're preaching to the anti-choir on this, though: I do not think LLMs are ready for use yet. Maybe another few years, maybe another few decades, we'll find out I guess, but what we have today sure as hell isn't it.
No? from https://lmarena.ai/ coding:
...
Rank* ... Model ... Score ... Org ... License
1 ... Grok-3-Preview-02-24 ... 1414 ... xAI ... Proprietary
1 ... GPT-4.5-Preview ... 1413 ... OpenAI ... Proprietary
3 ... Gemini-2.0-Pro-Exp-02-05 ... 1378 ... Google ... Proprietary
3 ... o3-mini-high ... 1369 ... OpenAI ... Proprietary
3 ... DeepSeek-R1 ... 1369 ... DeepSeek ... MIT
3 ... ChatGPT-4o-latest (2025-01-29) ... 1367 ... OpenAI ... Proprietary
3 ... Gemini-2.0-Flash-Thinking-Exp ... 1366 ... Google ... Proprietary
3 ... o1-2024-12-17 ... 1359 ... OpenAI ... Proprietary
3 ... o3-mini ... 1353 ... OpenAI ... Proprietary
4 ... o1-preview ... 1355 ... OpenAI ... Proprietary
4 ... Gemini-2.0-Flash-001 ... 1354 ... Google ... Proprietary
4 ... o1-mini ... 1353 ... OpenAI ... Proprietary
4 ... Claude 3.7 Sonnet ... 1350 ... Anthropic ... Proprietary
The only one that I've come across that makes me think LLMs will maybe be useful someday is Deepseek R1 and the redistillations based on it.
I've seen HN's fascination with OpenAI's products, and I can't understand why. Even O1 and O3, they're always too little too late, somebody else already is doing something better and throwing it into a HF repo. Must be the Silicon Valley RDF at work.