Features are vertical slices through the software cake, but the cake is actually made out of horizontal layers. Creating a bunch of servings of cake and then trying to stick them together just results in a fragile mess that's difficult to work with and easy to break.
But... reviewing code is harder than writing code. Expressing how I want something to be done in natural language is incredibly hard.
So over time I'm spending a lot of energy in those things, and only getting it 80% right.
Not to mention I'm constantly in this highly suspicious mode, trying to pierce through the veil of my own prompt and the code generated, because it's the edge cases that make work hard.
The end result is exhaustion. There is no recharge. Plans are front-loaded, and then you switch to auditing mode.
Whereas with code you front-load a good amount of design, but you can make changes as you go, and since you know your own code the effort to make those are much lower.
Working with LLM-generated code is mostly the same. The more sophisticated the autocomplete, the more mental overhead spent on understanding its output. There is an advantage: you are spared having to argue with a possibly defensive peer about what you believe is best. There is also a disadvantage: you do not feel like you are helping someone grow, and instead you are an unpaid (you are not paid for that in particular) contributor to a product by Microsoft (or similar) intended generally in longer term to push you and/or your peers out of the job. Additionally, there is no single mind that you can build rapport with and learn to understand the approaches and vibes of after a while.
Surprise, surprise… that is why programming languages were created.
Programming languages were created because of the different problem of “its very hard to get computers to understand natural language even if you know how to express what you want in it”.
Any problem with the difficulty of clearly expressing things in natural language (a real thing for which there have long been solutions between humans that are different than programming languages) was a problem that was technically unreachable at the user->machine interface because of that more fundamental problem for most of the history of computing (its arguable that LLMs are at the level where now it is potentially an issue with computers, but it took decades of having programming languages to be able to get the technical capacity to even experience the problem, it is not the problem programming languages address.)
I much prefer to choose tasks that can be done with 25%+ context left and then just start the next task with fresh context.
If I'm getting low on context I have it summarize the plan and progress in a text file rather than use /compact and then start a fresh context and reference that file, which I can then edit and try again if I'm not getting good results.
thanks for the article, it's a good one
yes, just as was said each and every previous time OpenAI/anthropic shit out a new model
"now it doesn't suck!"
The hedonic treadmill ensures it feels the same way each time.
But that doesn’t mean the models aren’t improving, nor that the scope isn’t expanding. If you compare today’s tools to those a year ago, the difference is stark.
They know that its a significant, but not revolutionary improvement.
If you supervise and manage your agents closely on well scoped (small) tasks they are pretty handy.
If you need a prototype and don't care about code quality or maintenance, they are great.
Anyone claiming 2x, 5x, 10x etc is absolutely kidding themselves for any non-trivial software.
Compared to just doing it yourself though?
Imagine having to micromanage a junior developer like this to get good results
Ridiculous tbh
I'd rather use it the other way, I'm the one in charge, and the AI reviews any logical flaw or things that I would have missed. I don't even have to think about context window since it'll only look at my new code logic.
So yeah, 3 years after the first ChatGPT and Copilot, I don't feel huge changes regarding "automated" AI programming, and I don't have any AI tool in my IDE, I pefer to have a chat using their website, to brainstorm, or occasionally find a solution to something I'm stuck on.
It's good enough that it helps, particularly in areas or languages that I'm unfamiliar with. But I'm constantly fighting with it.
Impressively, it recognized the structure of the code and correctly identified it as a component of an audio codec library, and provided a reasonably complete description of many minute details specific to this codec and the work that the function was doing.
Rather less impressively, it decided to ignore my request and write a function that used C++ features throughout, such as type inference and lambdas, or should I say "lambdas" because it was actually just a function-defined-within-a-function that tried to access and mutate variables outside of its own function scope, like we were writing Javascript or something. Even apart from that, the code was rife with the sorts of warnings that even a default invocation of gcc would flag.
I can see why people would be wowed by this on its face. I wouldn't expect any average developer to have such a depth of knowledge and breadth of pattern-matching ability to be able to identify the specific task that this specific function in this specific audio codec was performing.
At the same time, this is clearly not a tool that's suitable for letting loose on a codebase without EXTREME supervision. This was a fresh session (no prior context to confuse it) using a tightly crafted prompt (a small, self-contained C program doing one thing) with a clear goal, and it still required constant handholding.
At the end of the day, I got the code working by editing it manually, but in an honest retrospective I would have to admit that the overall process actually didn't save me any time at all.
Ironically, despite how they're sold, these tools are infinitely better at going from code to English than going the other way around.
Brainstorming, ideation and small, well defined tasks where I can quickly vet the solution : these feel like the sweet spot for current frontier model capabilities.
(Unless you are pumping out some sloppy React SPA that you don't care about anything except get it working as fast as possible - fine, get Claude code to one shot it)
Just two questions, if you don’t mind satisfying my curiosity.
- Did you tell it to write C? Or better yet, what was the prompt? You can use Claude --resume to easily find that.
- Which model? (Sooner or Opus)? Though I’d have expected either one to work.
There's a big difference with their benchmarks and real world coding.
It feels like part of my journey to being an "AI developer" is being present for those tradeoffs, metabolizing each one into my craft.
AI is a fickle, but powerful horse. I'm finding it a privilege to learn how to be a rider.
It’s amazing at reviewing code. It will identify what you fear, the horrors that lie within the codebase, and it’ll bring them out into the sunlight and give you a 7 step plan for fixing them. And the coding model is good, it can write a function. But it can’t follow a plan worth shit. And if I have to be extremely detailed at the function by function level, then I should be in the editor coding. Claude code is an amazing niche tool for code reviews and dialogue and debugging and coping with new technologies and tools, but it is not a productivity enhancement for daily coding.
> most SWE folks still have no idea how big the difference is between the coding agents they tried a year ago and declared as useless and chatgpt 5 paired with Codex or Cursor today
Also liszper: oh, you tried the current approach and don’t agree with me? Well you just don’t know what you are doing.
For context before that I had ~15 years of experience coding the traditional way.
I think the crucial difference is that I do actually see evidence (ie the codebase) posted sometimes for the former, the latter could well be entirely mythos -- a 24 day old account evangelizing for the legion of agents story does kind of fit the theme.
Let me guess this has something to do with AI?
The difference from an actual junior developer, of course, is that the human junior developer learns from his mistakes and gets better, but Claude seems to be stuck at the level of expertise of its model, and you have to wait for the model to improve before Claude improves.
This, this is you. This is the entire charade. It seems poetic somehow.