I'm a pretty huge proponent for AI-assisted development, but I've never found those 10x claims convincing. I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
That's not too far from this article's assumptions. From the article:
> I wouldn't be surprised to learn AI helps many engineers do certain tasks 20-50% faster, but the nature of software bottlenecks mean this doesn't translate to a 20% productivity increase and certainly not a 10x increase.
I think that's an under-estimation - I suspect engineers that really know how to use this stuff effectively will get more than a 0.2x increase - but I do think all of the other stuff involved in building software makes the 10x thing unrealistic in most cases.
This seems to be the current consensus.
A very similar quote from another recent AI article:
One host compares AI chatbots to “a very smart assistant who has a dozen Ph.D.s but is also high on ketamine like 30 percent of the time.”
https://lithub.com/what-happened-when-i-tried-to-replace-mys...
If an AI assistant was the equivalent of “a dozen PhDs” at any of the places I’ve worked you would see an 80-95% productivity reduction by using it.
they are the equivalent.
there is already an 80-95% productivity reduction by just reading about them on Hacker News.
Suppose A solves a problem and writes the solution down. B reads the answer and repeats it. Is B reasoning, when asked the same question? What about one that sounds similar?
If a human does B, that didn't require reasoning. Same for AI.
Believe it or not, people do make an effort to test their AIs on problems that they could not have seen in their training data.
https://en.m.wikipedia.org/wiki/Ketamine
Because of its hallucinogenic properties?
This property is likely an important driver of ketamine abuse and it being rather strongly 'moreish', as well as the subjective experiences of strong expectation during a 'trip'. I.e. the tendency to develop redose loops approaching unconsciousness in a chase to 'get the message from the goddess' or whatever, which seems just out of reach (because it's actually a feeling of expectation and not actually a partially installed divine T3 rig).
But you're right.
I don’t think models are doing that. They certainly can retrieve a huge amount of information that would otherwise only be available to specialists such as people with PhDs… but I’m not convinced the models have the same level of understanding as a human PhD.
It’s easy to test though- the models simply have to write and defend a dissertation!
To my knowledge, this has not yet been done.
There's the old trope that systems programmers are smarter than applications programmers but SWE-Bench puts the lie to that. Sure, SWE-Bench problems are all in the language of software, applications programmers take badly specified tickets in the language of product managers, testers and end users and have to turn that into the language of SWE-Bench to get things done. I am not that impressed with 65% performance on SWE-Bench because those are not the kind of tickets that I have to resolve at work, but rather at work if I want to use AI to help maintain a large codebase I need to break the work down into that kind of ticket.
Except the documentation lies and in reality your vendor shipped you a part with timing that is slightly out of sync with what the doc says and after 3 months of debugging, including using an oscilloscope, you figure out WTF is going on. You report back to your supplier and after two weeks of them not saying any thing they finally reply that the timings you have reverse engineered are indeed the correct timings, sorry for any misunderstandings with the documentation.
As an application's engineer, my computer doesn't lie to me and memory generally stays at a value I set it to unless I did something really wrong.
Backend services are the easiest thing in the world to write, I am 90% sure that all the bullshit around infra is just artificial job security, and I say this as someone who primarily does backend work now days.
Are they a constant source of low level annoyance? Sure. But I've never had to look at a bus timing diagram to understand how to use one, nor worried about an nginx file being rotated 90 degrees and wired up wrong!
It's funny, Github Copilot puts these models in the 'bargin bin' (they are free in 'ask' mode, whereas the other models count against your monthly limit of premium requests) and it's pretty clear why, they seem downright nerfed. They're tolerable for basic questions but you wouldn't use them if price weren't a concern.
Brandwise, I don't think it does OpenAI any favors to have their models be priced as 'worthless' compared to the other models on premium request limits.
This alone is where I get a lot of my value. Otherwise, I'm using Cursor to actively solve smaller problems in whatever files I'm currently focused on. Being able to refactor things with only a couple sentences is remarkably fast.
The more you know about your language's features (and their precise names), and about higher-level programming patterns, the better time you'll have with LLMs, because it matches up with real documentation and examples with more precision.
I'm curious, this is js/ts? Asking because depending on the lang, good old machine refactoring is either amazeballs (Java + IDE) or non-existent (Haskell).
I'm not js/ts so I don't know what the state of machine refactoring is in VS code ... But if it's as good as Java then "a couple of sentences" is quite slow compared to a keystroke or a quick dialog box with completion of symbol names.
It's not always right, but I find it helpful when it finds related changes that I should be making anyway, but may have overlooked.
Another example: selecting a block that I need to wrap (or unwrap) with tedious syntax, say I need to memoize a value with a React `useMemo` hook. I can select the value, open Quick Chat, type "memoize this", and within milliseconds it's correctly wrapped and saved me lots of fiddling on the keyboard. Scale this to hundreds of changes like these over a week, it adds up to valuable time-savings.
Even more powerful: selecting 5, 10, 20 separate values and typing: "memoize all of these" and watching it blast through each one in record time with pinpoint accuracy.
We use a Team plan ($500 /mo), which includes 250 ACUs per month. Each bug or small task consumes anywhere between 1-3 ACUs, and fewer units are consumed if you're more precise with your prompt upfront. A larger prompt will usually use fewer ACUs because follow-up prompts cause Devin to run more checks to validate its work. Since it can run scripts, compilers, linters, etc. in its own VM -- all of that contributes to usage. It can also run E2E tests in a browser instance, and validate UI changes visually.
They recommend most tasks should stay under 5 ACUs before it becomes inefficient. I've managed to give it some fairly complex tasks while staying under that threshold.
So anywhere between $2-6 per task usually.
Best analogy I've ever heard and it's completely accurate. Now, back to work debugging and finishing a vibe coded application I'm being paid to work on.
If you're not specific enough, it will definitely spit out a half-baked pseudocode file where it expects you to fill in the rest. If you don't specify certain libraries, it'll use whatever is featured in the most blogspam. And if you're in an ecosystem that isn't publicly well-documented, it's near useless.
First, until I can re-learn boundaries, they are a fiasco for work-life balance. It's way too easy to have a "hmm what if X" thought late at night or first thing in the morning, pop off a quick ticket from my phone, assign to Copilot, and then twenty minutes later I'm lying in bed reviewing a PR instead of having a shower, a proper breakfast, and fully entering into work headspace.
And on a similar thread, Copilot's willingness to tolerate infinite bikeshedding and refactoring is a hazard for actually getting stuff merged. Unlike a human colleague who loses patience after a round or two of review, Copilot is happy to keep changing things up and endlessly iterating on minutiae. Copilot code reviews are exhausting to read through because it's just so much text, so much back and forth, every little change with big explanations, acknowledgments, replies, etc.
But it is the most productive intern I've ever pair programmed with. The real ones hallucinate about as often too.
if I want to throw a shuriken abiding to some artificial, magic Magnus force like in the movie wanted, both chatGpt and Claude let me down, using pygame. what if I wanted c-level performance or if I wanted to use zig? burp.
It works like the average Microsoft employee, like some doped version of an orange wig wearer who gets votes because his daddys kept the population as dumb as it gets after the dotcom x Facebook era. in essence, the ones to be disappointed by are the Chan-Zuckerbergs of our time. there was a chance, but there also was what they were primed for
The best way to think of chat bot "AI" is as the compendium of human intelligence as recorded in books and online media available to it. It is not intelligent at all on its own and its judgement can't be better than its human sources because it has no biological drive to sythesize and excel. Its best to think of AI as a librarian of human knowledge or an interactive Wikipedia which is designed to seem like an intelligent agent but is actually not.
I suspect that some researchers with a very different approach will come up with a neural network that learns and works more like a human in future though. Not the current LLMS but something with a much more efficient learning mechanism that doesn't require a nuclear power station to train.
Intelligence is not some universal abstract thing acheivable after a certain computational threshold is reached. Rather its a quality of the behavior patterns of specific biological organisms following their drives.
There's a long history in AI where neural nets were written off as useless (Minsky was the famous destroyer of the idea, I think) and yet in the end they blew away the alternatives completely.
We have something now that's useful in that it is able to glom a huge amount of knowledge but the cost of doing so it tremendous and therefore in many ways it's still ridiculously inferior to nature because it's only a partial copy.
A lot of science fiction has assumed that robots, for example, would automatically be superior to humans - but are robots self-repairing or self replicating? I was reading recently about how the reasons why many developers like python are the reasons why it can never be made fast. In other words you cannot have everything - all features come at a cost. We will probably have less human and more human AIs because they will offer us different trade offs.
The suggestions were always unusably bad. The /fix were always obviously and straight up false unless it was a super silly issue.
Claude Code with Opus model on the other hand was mind-blowing to me and made me change my mind on almost everything wrt my opinion of LLMs for coding.
You still need to grow the skill of how to build the context and formulate the prompt, but the buildin execution loop is a complete game changer and I didn't realize that until I actually used it effectively on a toy project myself.
MCP in particular was another thing I always thought was massively over hyped, until I actually started to use some in the same toy project.
Frankly, the building blocks already exist at this point to make a vast majority of all jobs redundant (and I'm thinking about all grunt work office jobs, not coding in particular). The tooling still need to be created, so I'm not seeing a short term realization (<2 yrs), but medium term (5+yrs)?
You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
I don't buy that. The linked article makes a solid argument for why that's not likely to happen: agentic loop coding tools like Claude Code can speed up the "writing code and getting it working" piece, but the software development lifecycle has so much other work before you get to the "and now we let Claude Code go brrrrrrr" phase.
These are exactly the people that are going to stay, medium term.
Let's explore a fictional example that somewhat resembles my, and I suspect a lot of peoples current dayjob.
A Micro-Service architecture, each team administers 5-10 services and the whole application, which is once again only a small part of the platform as a whole is developed by maybe 100-200 devs. So something like ~200 micro services
The application architects are gonna be completely save in their jobs. And so are the lead devs in each team - at least from my perspective. Anyone else? I suspect MBAs in 5 yrs will not see their value anymore. That's gonna be the vast majority of all devs, that's likely going to cost 50% of the devs their jobs. And middle management will be slimmed down just as quickly, because you suddenly need a lot less managers.
tl;dr: in the future when vibe coding works 100% of the time, logically the only companies that will exist are the ones that have processes that AI can’t do, because all the other parts of the supply chain can all be done in-house
It's conceivable that thats going to happen, eventually. but that'd likely require models a lot more advanced to what we have now.
The agent approach with lead devs administering and merging the code the agents made is feasible with today's models. The missing part is the tooling around the models and the development practices that that standardizes this workflow.
That's what I'd expect to take around 5 yrs to settle.
Toy project viability does not connect with making people redundant in the process (ever, really) — at least not for me. Care to elaborate where do you draw the optimism from?
I called it a toy project because I'm not earning money with it - hence it's a toy.
It does have medium complexity with roughly 100k loc though.
And I think I need to repeat myself, because you seem to read something into my comment that I didn't say: the building blocks exist doesn't mean that today's tooling is sufficient for this to play out, today.
I very explicitly set a time horizon of 5 yrs.
"Toy project" is usually used in a different context (demonstrate something without really doing something useful): yours sounds more like a "hobby project".
At the heart is my hobby of reading web and light novels. I've been implementing various versions of a scraper and ePub reader for over 15 years now, ever since I started working as a programmer.
I've been reimplementing it over the years with the primary goal of growing my experiences/ability. In the beginning it was a plain Django app, but it grew from that to various languages such as elixir, Java (multiple times with different architecture approaches), native Android, JS/TS Frontend and sometimes backend - react, angular, trpc, svelte tanstack and more.
So I know exactly how to implement it, as I've give through a lot of version for the same functionality. And the last version I implemented (tanstack) was in July, via Claude Code and got to feature parity (and more) within roughly 3 weeks.
And I might add: I'm not positive about this development either, whatsoever. I'm just expecting this to happen, to the detriment of our collective futures (as programmers)
I'm gonna pivot to building bomb shelters maybe
Or stockpiling munitions to sell during the troubles
Maybe some kind of protest support saas. Molotov deliveries as a service, you still have to light them and throw them but I guarantee next day delivery and they will be ready to deploy into any data center you want to burn down
What Im trying to say is "companies letting people go in staggering numbers" is a societal failure state not an ideal
There are so many flaws in your plan, I have no doubt that "AI" will ruin some companies that try to replace humans with a "tin can". LLMs are being inserted loosey-goosey into too many places by people that don't really understand the liability problems it creates. Because the LLM doesn't think, it doesn't have a job to protect, it doesn't have a family to feed. It can be gamed. It simply won't care.
The flaws in "AI" are already pretty obvious to anyone paying attention. It will only get more obvious the more LLMs get pushed into places they really do not belong.
And you are confident that the human receptionist will never fall for social engineering?
I don't think data protection is even close to the biggest problem with replacing all/most employees with bots.
Thats the key right there. Try to use it in a project that handles PII, needs data to be exact, or has many dependencies/libraries and needs to not break for critical business functions.
(1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code.
(2) however, the biggest unlock is it makes working on side projects __immensely__ easier. Before AI I was always too tired to spend significant time on side projects. Now, I can see my ideas come to life (albeit with shittier code), with much less mental effort. I also get to improve my AI engineering skills without the constraint of deadlines, data privacy, tool constraints etc..
Being able to sit down after a long way of work and ask an AI model to implement some bug or feature on something while you relax and _not_ type code is a major boon. It is able to immediately get context and be productive even when you are not.
I hear this take a lot but does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
I know that a whole bunch of people will respond with the exact set of words that will make it show up right away on Google, but that's not the point: I couldn't remember what language it used, or any other detail beyond what I wrote and that it had been shared on Hacker News at some point, and the first couple Google searches returned a million other similar but incorrect things. With an LLM I found it right away.
The training cutoff comes into play here a bit, but 95% of the time I'm fuzzy searching like that I'm happy with projects that have been around for a few years and hence are both more mature and happen to fall into the training data.
Me, typing into a search engine, a few years ago: "Postgres CTE tutorial"
Me, typing into any AI engine, in 2025: "Here is my schema and query; optimize the query using CTEs and anything else you think might improve performance and readability"
This sort of implies you are not reading and deeply understanding your LLM output, doesn't it?
I am pretty strongly against that behavior
This can't be a serious question? 5 minutes of testing will prove to you that it's not just better, it's a totally new paradigm. I'm relatively skeptical of AI as a general purpose tool, but in terms of learning and asking questions on well documented areas like programming language spec, APIs etc it's not even close. Google is dead to me in this use case.
It is a serious question. I've spent much more than 5 minutes testing this, and I've found that your "totally new paradigm" is for morons
That 20 minutes, repeated over and over over the course of a career, is the difference between being a master versus being an amateur
You should value it, even if your employer doesn't.
Your employer would likely churn you into ground beef if there was a financial incentive to, never forget that
"You had a problem. You tried to solve it with regex. Now you have two problems"
1) your original problem 2) your broken regex
I would like to propose an addition
"You had a problem. You tried to solve it with AI generated regex. Now you have three problems"
1) your original problem 2) your broken regex 3) your reliance on AI
If you try it yourself you'll soon find out that the answer is a very obvious yes.
You don't need a paid plan to benefit from that kind of assistance, either.
At this point I am close to deciding to fully boycott it yes
> If you try it yourself you'll soon find out that the answer is a very obvious yes
I have tried plenty over the years, every time a new model releases and the hype cycle fires up again I look in to see if it is any better
I try to use it a couple of weeks, decide it is overrated and stop. Yes it is improving. No it is not good enough for me to trust
How have you found it not to be significantly better for those purposes?
The "not good enough for you to trust" is a strange claim. No matter what source of info you use, outside of official documentation, you have to assess its quality and correctness. LLM output is no different.
Not even remotely
> LLM output is no different
It is different
A search result might take me to the wrong answer but an LLM might just invent nonsense answers
This is a fundamentally different thing and is more difficult to detect imo
> This is a fundamentally different thing and is more difficult to detect imo
99% of the time it's not. You validate and correct/accept like you would any other suggestion.
As far as writing "tedious" code goes, I think the AI agents are great. Where I have personally found a huge advantage is in keeping documentation up-to-date. I'm not sure if it's because I have ADHD or because my workload is basically enough for 3 people, but this is an area I struggle with. In the past, I've often let the code be it's own documentation, because that would be better than having out-dated/wrong documentation. With AI agents, I find that I can have good documentation that I don't need to worry about beyond approving in the keep/discard part of the AI agent. I also rarely write SQL, bicep, yaml configs and similar these days, because it's so easy to determine if the AI agent got it wrong. This requires you're an expert on infrastructure as code and SQL, but if you are, the AI agents are really fast. I think this is one of the areas where they 10x at times. I recently wrote an ingress for an ftp pod (don't ask), and writing all those ports for passive mode would've taken me a while. There are a lot of risk involved. If you can't spot errors or outdated functionality quickly, then I would highly recommend you don't do this. Bicep LLM output is often not up to date, and since the docs are excellent what I do in those situations is that I copy/paste what I need. Then I let the AI agent update things like parameters, which certainly isn't 10x but still faster than I can do it.
Similarily it's rather good at writing and maintaining automatic tests. I wouldn't recommend this unless you're working with actively dealing with corrupted states directly in your code. But we do fail-fast programming/Design by Contract so the tests are really just an extra precaution and compliance thing, meaning that they aren't as vital as they will be for more implicit ways of dealing with error handling.
I don't think AI's are good at helping you with learning or getting unstuck. I guess it depends on how you would normally deal with. If the alternative is "google programming" and I imagine it is sort of similar and probably more effective. It's probably also more dangerous. At least we've found that our engineers are more likely to trust the LLM than a medium article or a stackoverflow thread.
I haven't begun doing side projects or projects for self, yet. But I did go down the road of finding out what would be needed to do something I wished existed. It was much easier to explore and understand the components and I might have a decent chance at a prototype.
The alternative to this would have been to ask people around or formulate extensively researched questions for online forums, where I'd expect to get half cryptic answers (and a jibe at my ignorance every now and then) at a pace that I would take years before I had something ready.
I see the point for AI as a prototyping and brainstorming tool. But I doubt we are at a point where I would be comfortable pushing changes to a production environment without giving 3x the effort in reviewing. Since there's a chance of the system hallucinating, I have a genuine fear that it would seem accurate, but what it would do is something really really stupid.
For 20 a month I can get my stupid tool and utility ideas from "it would be cool if I could..." to actual "works well enough for me" -tools in an evening - while I watch my shows at the same time.
After a day at work I don't have the energy to start digging through, say, OpenWeather's latest 3.0 API and its nuances and how I can refactor my old code to use the new API.
Claude did it in maybe one episode of What We Do in the Shadows :D I have a hook that makes my computer beep when Claude is done or pauses for a question, so I can get back, check what it did and poke it forward.
claude config set --global preferredNotifChannel terminal_bell
https://docs.anthropic.com/en/docs/claude-code/terminal-conf...
The smartest programmer I know is so impressive mainly for two reasons: first, he seems to have just an otherworldly memory and seems to kind of have absolutely every little feature and detail of the programming languages he uses memorized. Second, his real power is really in cognitive ability, or the ability to always quickly and creatively come up with the smartest and most efficient yet elegant and clean solution to any given problem. Of course somewhat opinionated but in a good way. Funnily he often wouldn't know the academic/common name for some algorithm he arrived at but it just happened to be what made sense to him and he arrived at it independently. Like a talented musician with perfect pitch who can't read notation or doesn't know theory yet is 10x more talented than someone who has studied it all.
When I pair program with him, it's evident that the current iteration of AI tools is not as quick or as sharp. You could arrive at similar solutions but you would have to iterate for a very long time. It would actually slow that person down significantly.
However, there is such a big spectrum of ability in this field that I could actually see this increasing for example my productivity by 10x. My background/profession is not in software engineering but when I do it in my free time the perfectionist tendencies make me work very slowly. So for me these AI tools are actually cool for generating the first crappy proof of concepts for my side projects/ideas, just to get something working quickly.
It helps me being lazy because I have a rough expectation of what the outcome should be - and I can directly spot any corner cases or other issues the AI proposed solution has, and can either prompt it to fix that, or (more often) fix those parts myself.
The bottom 20% may not have enough skill to spot that, and they'll produce superficially working code that'll then break in interesting ways. If you're in an organization that tolerates copy and pasting from stack overflow that might be good enough - otherwise the result is not only useless, but as it provides the illusion of providing complete solution you're also closing the path of training junior developers.
Pretty much all AI attributed firings were doing just that: Get rid of the juniors. That'll catch up with us in a decade or so. I shouldn't complain, though - that's probably a nice earning boost just before retirement for me.
I was watching to learn how other devs are using Claude Code, as my first attempt I pretty quickly ran into a huge mess and was specifically looking for how to debug better with MCP.
The most striking thing is she keeps on having to stop it doing really stupid things. She slightly glosses over those points a little bit by saying things like "I roughly know what this should look like, and that's not quite right" or "I know that's the old way of installing TailwindCSS, I'll just show you how to install Context7", etc.
But in each 10 minute episodes (which have time skips while CC thinks) it happens at least twice. She has to bring her senior dev skills in, and it's only due to her skill that she can spot the problem in seconds flat.
And after watching much of it, though I skipped a few episodes at the end, I'm pretty certain I could have coded the same app quicker than she did without agentic AI, just using the old chat window AIs to bash out the React boilerplate and help me quickly scan the documentation for getting offline. The initial estimate of 18 days the AI came up with in the plan phase would only hold truye if you had to do it "properly".
I'm also certain she could have too.
[1] https://www.youtube.com/watch?v=erKHnjVQD1k
It's worth a watch if you're not doing agentic coding yet. There were points I was impressed with what she got it to do. The TDD section was quite impressive in many ways, though it immediately tried to cheat and she had to tell it to do it properly.
I posted a demo here a while ago where I try to have it draw turtle graphics:
https://news.ycombinator.com/item?id=44013939
Since then I've also provided enough glue that it can interact with the Arch Linux installer in a VM (or actual hardware, via serial port) - with sometimes hilarious results, but at least some LLMS do manage to install Arch with some guidance:
https://github.com/aard-fi/arch-installer
Somewhat amusingly, some LLMs have a tendency to just go on with it (even when it fails), with rare hallucinations - while other directly start lying and only pretend they logged in.
If I'm writing a series of very similar test cases, it's great for spamming them out quickly, but I still need to make sure they're actually right. This is easier to spot errors because I didn't type them out.
It's also decent for writing various bits of boilerplate for list / dict comprehensions, log messages (although they're usually half wrong, but close enough to what I was thinking), time formatting, that kind of thing. All very standard stuff that I've done a million times but I may be a little rusty on. Basically StackOverflow question fodder.
But for anything complex and domain-specific, it's more wrong than it's right.
but the principle is the same: if the human isn’t doing theory-building, then no one is
Its like Wordpress all over again but with people even less able to code. There's going to be vast amounts of opportunities for people to get into the industry via this route but its not going to be a very nice route for many of them. Lots of people who understand software even less than c-suite holding the purse-strings.
People keep focusing on general intelligence style capabilities but that is the golden grail. The world could go through multiple revolutions before finding that golden grail, but even before then everything would have changed beyond recognition.
So write an integration over the API docs I just copy-pasted.
This is particularly true for headlines like this one which stand alone as statements.
Again, appreciate your thoughts, I have a huge amount of respect for your work. I hope you have a good one!
Well, the people who quote from TFA have usually at least read the part they quoted ;)
[And to those saying we're using it wrong... well I can't argue with something that's not falsifiable]
I am not allowed to use LLMs at work for work code so I can't tell what claims are real. Just my 80s game reimplementations of Snake and Asteroids.
https://www.construx.com/blog/productivity-variations-among-...
Depending on the environment, I can imagine the worst devs being net negative.
Thinking about it personally, a 10X label means I'm supposedly the smartest person in the room and that I'm earning 1/10th what I should be. Both of those are huge negatives.
I have found for myself it helps motivate me, resulting in net productivity gain from that alone. Even when it generates bad ideas, it can get me out of a rut and give me a bias towards action. It also keeps me from procrastinating on icky legacy codebases.
I guess this is still the "caveat" that can keep the hype hopes going. But I've found at a team velocity level, with our teams, where everyone is actively using agentic coding like Claude Code on the daily, we actually didn't see an increase in team velocity yet.
I'm curious to hear anecdotal from other teams, has your team seen velocity increase since it adopted agentic AI?
This article thinks that most people who say 10x productivity are claiming 10x speedup on end-to-end delivering features. If that's indeed what someone is saying, they're most of the time quite simply wrong (or lying).
But I think some people (like me) aren't claiming that. Of course the end to end product process includes a lot more work than just the pure coding aspect, and indeed none of those other parts are getting a 10x speedup right now.
That said, there are a few cases where this 10x end-to-end is possible. E.g. when working alone, especially on new things but not only - you're skipping a lot of this overhead. That's why smaller teams, even solo teams, are suddenly super interesting - because they are getting a bigger speedup comparatively speaking, and possibly enough of one to be able to rival larger teams.
If I'm using it to remember the syntax or library for something I used to know how to do, it's great.
If I'm using it to explore something I haven't done before, it makes me faster, but sometimes it lies to me. Which was also true of Stack Overflow.
But when I ask it to so something fairly complex on it's own, it usually tips over. I've tried a bunch of tests with a bunch of models, and it never quite gets it right. Sometimes it's minor stuff that I can fix if I bang on it long enough, and sometimes it's a steaming pile that I end up tossing in the garbage.
For example, I've asked it to code me a web-based calculator, or a 3D model of the solar system using WebGL, and none of the models I've tried have been able to do either.
I think that the key realization is that there are tasks where LLMs excel and might even buy you 10x productivity, whereas some tasks their contribution might even be net negative.
LLM are largely excellent at writing and refactoring unit tests, mainly because their context is very limited (i.e., write a method in a class that calls this specific method of this specific class a specific way and check the output) and their output is very repetitive (i.e., write isolated methods in standalone classes without output that are not called anywhere). They also seem helpful when prompted to add logging. LLMs are also effective in creating greenfield projects, serving as glorified template engines. But when lightly pressed on specific tasks like implementing a cross-domain feature... Their output starts to be at best a big ball of mud.
And does an AI agent doing a code review actually reduce that time too? I have doubts. Caveat, I haven't seen it in practice yet.
What will happen is over time this will become the new baseline for developing software.
It will mean we can deliver software faster. Maybe more so than other advances, but it won't fundamentally change the fact that software takes real effort and that effort will not go away, since that effort is much more than just coding this or that function.
I could create a huge list of things that have made developing and deploying quality software easier: linters, static type checkers, code formatters, hot reload, intelligent code completion, distributed version control (i.e., Git), unit testing frameworks, inference schema tools, code from schema, etc. I'm sure others can add dozens of items to that list. And yet there seems to be an unending amount of software to be built, limited only by the people available to build it and an organizations funding to hire those people.
In my personal work, I've found AI-assisted development to make me faster (not sure I have a good estimate for how much faster.) What I've also found is that it makes it much easier to tackle novel problems within an existing solution base. And I believe this is likely to be a big part of the dev productivity gain.
Just an example, lets say we want to use the strangler pattern as part of our modernization approach for a legacy enterprise app that has seen better days. Unless you have some senior devs who are both experienced with that pattern AND experienced with your code base, it can take a lot of trial and error to figure out how to make it work. (As you said, most of our work isn't actually typing code.)
This is where an AI/LLM tool can go to work on understanding the code base and understanding the pattern to create a reference implementation approach and tests. That can save a team of devs many weeks of trial & error (and stress) not to mention guidance on where they will run into roadblocks deep into the code base.
And, in my opinion, this is where a huge portion of the AI-assisted dev savings will come from - not so much writing the code (although that's helpful) but helping devs get to the details of a solution much faster.
It's that googling has always gotten us to generic references and AI gets us those references fit for our solution.
And we're not seeing that at all. The companies whose software I use that did announce big AI initiatives 6 months ago, if they really had gotten 10x productivity gain, that'd be 60 months—5 years—worth of "productivity". And yet somehow all of their software has gotten worse.
This feels exactly right and is what I’ve thought since this all began.
But it also makes me think maybe there are those that A.I. helps 10x, but more because that code input is actually a very large part of their job. Some coders aren’t doing much design or engineering, just assembly.
I don't think I've encountered programmer like that in my own career, but I guess they might exist somewhere!
Claude Code (which is apparently the best in general) isn't very good at reviewing existing large projects IME, because it doesn't want to load a lot of text into its context. If you ask it to review an existing project it'll search for keywords instead of just loading an entire file.
That and it really wants to please you, so if you imply you own a project it'll be a lot more positive than it may deserve.
The hardest part of my job is actually understanding the problem space and making sure we're applying the correct solution. Actual coding is probably about 30% of my job.
That means, I'm only looking at something like 30% productivity gain by being 5x as effective at coding.
Now when I'm designing software there are all sorts of things where I'm much less likely to think "nah, that will take too long to type the code for".
But of course that’s ridiculous.
10x is intended to symbolize a multiplier. As Microsoft fired that guy, 10 × 0 is still 0.
I'm not sure it is and I'll take it a step further:
Over the course of development, efficiency gains trend towards zero.
AI has a better case for increasing surface area (what an engineer is capable of working on) and effectiveness, but efficiency is a mirage.
Who's making these claims?
It didn’t.
I’ll admit this is not helping the case of “but people are saying…”
You understand what YC is right?
1.2x increase
I've been heavily leaning on AI for an engagement that would otherwise have been impossible for me to deliver to the same parameters and under the same constraints. Without AI, I simply wouldn't have been able to fit the project into my schedule, and would have turned it down. Instead, not only did I accept and fit it into my schedule, I was able to deliver on all stretch goals, put in much more polish and automated testing than originally planned, and accommodate a reasonable amount of scope creep. With AI, I'm now finding myself evaluating other projects to fit into my schedule going forward that I couldn't have considered otherwise.
I'm not going to specifically claim that I'm an "AI 10x engineer", because I don't have hard metrics to back that up, but I'd guesstimate that I've experienced a ballpark 10x speedup for the first 80% of the project and maybe 3 - 5x+ thereafter depending on the specific task. That being said, there was one instance where I realized halfway through typing a short prompt that it would have been faster to make those particular changes by hand, so I also understand where some people's skepticism is coming from if their impression is shaped by experiences like that.
I believe the discrepancy we're seeing across the industry is that prompt-based engineering and traditional software engineering are overlapping but distinct skill sets. Speaking for myself, prompt-based engineering has come naturally due to strong written communication skills (e.g. experience drafting/editing/reviewing legal docs), strong code review skills (e.g. participating in security audits), and otherwise being what I'd describe as a strong "jack of all trades, master of some" in software development across the stack. On the other hand, for example, I could easily see someone who's super 1337 at programming high-performance algorithms and mid at most everything else finding that AI insufficiently enhances their core competency while also being difficult to effectively manage for anything outside of that.
As to how I actually approach this:
* Gemini Pro is essentially my senior engineer. I use Gemini to perform codebase-wide analyses, write documentation, and prepare detailed sprint plans with granular todo lists. Particularly for early stages of the project or major new features, I'll spend a several hours at a time meta-prompting and meta-meta-prompting with Gemini just to get a collection of prompts, documents, and JSON todo lists that encapsulate all of my technical requirements and feedback loops. This is actually harder than manual programming because I don't get the "break" of performing out all the trivial and boilerplate parts of coding; my prompts here are much more information-dense than code.
* Claude Sonnet is my coding agent. For Gemini-assisted sprints, I'll fire Claude off with a series of pre-programmed prompts and let it run for hours overnight. For smaller things, I'll pair program with Claude directly and multitask while it codes, or if I really need a break I'll take breaks in between prompting.
* More recently, Grok 4 through the Grok chat service is my Stack Overflow. I can't rave enough about it. Asking it questions and/or pasting in code diffs for feedback gets incredible results. Sometimes I'll just act as a middleman pasting things back and forth between Grok and Claude/Gemini while multitasking on other things, and find that they've collaboratively resolved the issue. Occasionally, I've landed on the correct solution on my own within the 2 - 3 minutes it took for Grok to respond, but even then the second opinion was useful validation. o3 is good at this too, but Grok 4 has been on another level in my experience; its information is usually up to date, and its answers are usually either correct or at least on the right track.
* I've heard from other comments here (possibly from you, Simon, though I'm not sure) that o3 is great at calling out anti-patterns in Claude output, e.g. its obnoxious tendency to default to keeping old internal APIs and marking them as "legacy" or "for backwards compatibility" instead of just removing them and fixing the resulting build errors. I'll be giving this a shot during tech debt cleanup.
As you can see, my process is very different from vibe coding. Vibe coding is fine for prototyping, on for non-engineers with no other options, but it's not how I would advise anyone to build a serious product for critical use cases.
One neat thing I was able to do, with a couple days' notice, was add a script to generate a super polished product walkthrough slide deck with a total of like 80 pages of screenshots and captions covering different user stories, with each story having its own zoomed out overview of a diagram of thumbnails linking to the actual slides. It looked way better than any other product overview deck I've put together by hand in the past, with the bonus that we've regenerated it on demand any time an up-to-date deck showing the latest iteration of the product was needed. This honestly could be a pretty useful product in itself. Without AI, we would've been stuck putting together a much worse deck by hand, and it would've gotten stale immediately. (I've been in the position of having to give disclaimers about product materials being outdated when sharing them, and it's not fun.)
Anyway, I don't know if any of this will convince anyone to take my word for it, but hopefully some of my techniques can at least be helpful to someone. The only real metric I have to share offhand is that the project has over 4000 (largely non-trivial) commits made substantially solo across 2.5 months on a part-time schedule juggled with other commitments, two vacations, and time spent on aspects of the engagement other than development. I realize that's a bit vague, but I promise that it's a fairly complex project which I feel pretty confident I wouldn't have been capable of delivering in the same form on the same schedule without AI. The founders and other stakeholders have been extremely satisfied with the end result. I'd post it here for you all to judge, but unfortunately it's currently in a soft launch status that we don't want a lot of attention on just yet.
Now that LLMs have actually fulfilled that dream — albeit by totally different means — many devs feel anxious, even threatened. Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
I think Colton’s article nails the emotional side of this: imposter syndrome isn’t about the actual 10x productivity (which mostly isn't real), it’s about the perception that you’re falling behind. Meanwhile, this perception is fueled by a shift in what “software engineering” looks like.
LLMs are effectively the ultimate CASE tools — but they arrived faster, messier, and more disruptively than expected. They don’t require formal models or diagrams. They leap straight from natural language to executable code. That’s exciting and unnerving. It collapses the old rites of passage. It gives power to people who don’t speak the “sacred language” of software. And it forces a lot of engineers to ask: What am I actually doing now?
Now I can always switch to a different model, increase the context, prompt better etc. but I still feel that actual good quality AI code is just out of arms reach, or when something clicks, and the AI magically starts producing exactly what I want, that magic doesn't last.
Like with stable diffusion, people who don't care as much or aren't knowledgeable enough to know better, just don't get what's wrong with this.
A week ago, I received a bug ticket claiming one of the internal libs i wrote didn't work. I checked out the reporter's code, which was full of weird issues (like the debugger not working and the typescript being full of red squiggles), and my lib crashed somewhere in the middle, in some esoteric minified js.
When I asked the guy who wrote it what's going on, he admitted he vibe coded the entire project.
And the knock-on effect is that there is less menial work. Artists are commissioned less for the local fair, their friend's D&D character portrait, etc. Programmers find less work building websites for small businesses, fixing broken widgets, etc.
I wonder if this will result in fewer experts, or less capable ones. As we lose the jobs that were previously used to hone our skills will people go out of their way to train themselves for free or will we just regress?
A schematic of a useless amplifier that oscillates looks just as pretty as one of a correct amplifier. If we just want to use it as a repeated print for the wallpaper of an electronic lab, it doesn't matter.
This really irritates me. I’ve had the same experience with teammates’ pull requests they ask me to review. They can’t be bothered to understand the thing, but then expect you to do it for them. Really disrespectful.
Even if LLMs worked perfectly without hallucinations (they don't and might never), a conscientious developer must still comprehend every line before shipping it. You can't review and understand code 10x faster just because an LLM generated it.
In fact, reviewing generated code often takes longer because you're reverse-engineering implicit assumptions rather than implementing explicit intentions.
The "10x productivity" narrative only works if you either:
- Are not actually reviewing the output properly
or
- Are working on trivial code where correctness doesn't matter.
Real software engineering, where bugs have consequences, remains bottlenecked by human cognitive bandwidth, not code generation speed. LLMs shifted the work from writing to reviewing, and that's often a net negative for productivity.
This seems excessive to me. Do you comprehend the machine code output of a compiler?
I must comprehend code at the abstraction level I am working at. If I write Python, I am responsible for understanding the Python code. If I write Assembly, I must understand the Assembly.
The difference is that Compilers are deterministic with formal specs. I can trust their translation. LLMs are probabilistic generators with no guarantees. When an LLM generates Python code, that becomes my Python code that I must fully comprehend, because I am shipping it.
That is why productivity is capped at review speed, you can't ship what you don't understand, regardless of who or what wrote it.
It can actually be worse when they do. Formalizing behavior means leaving out behavior that can't be formalized, which basically means if your language has undefined behavior then the handling of that will be maximally confusing, because your compiler can no longer have hacks for handling it in a way that "makes sense".
There's many jobs that can be eliminated with software, but haven't because managers don't want to hire SWEs without proven value. I don't think HN realizes how big that market is.
With AI, the managers will replace their employees with a bunch of code they don't understand, watch that code fail in 3 years, and have to hire SWEs to fix it.
I'd bet those jobs will outnumber the ones initially eliminated by having non-technical people deliver the first iteration.
Many of those jobs will be high-skill/impact because they are necessarily focused on fixing stuff AI can't understand.
The names all looked right, the comments were descriptive, it has test cases demonstrating the code work. It looks like something I'd expect a skilled junior or a senior to write.
The thing is, the code didn't work right, and the reasons it didn't work were quite subtle. Nobody would have fixed it without knowing how to have done it in the first place, and it took me nearly as long to figure out why as if I'd just written it myself in the first place.
I could see it being useful to a junior who hasn't solved a particular problem before and wanted to get a starting point, but I can't imagine using it as-is.
Nor do they produce those (do they?). That is what I would like to see. Formal models and diagrams are not needed to produce code. Their point is that they allow us to understand code and to formalize what we want it to do. That's what I'm hoping AI could do for me.
> Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
is what raised flags in my head. Rather than explain the difference between glorified autocompletion and generation, the post assumes there is a difference then uses florid prose to hammer in the point it didn't prove.
I've heard the paragraph "why? Because X. Which is not Y. And abcdefg" a hundred times. Deepseek uses it on me every time I ask a question.
Which came first...
And while I don't categorically object to AI tools, I think your selling objections to them short.
It's completely legitimate to want an explainable/comprehendable/limited-and-defined tool rather than a "it just works" tool. Ideally, this puts one in an "I know its right" position rather than a "I scanned it and it looks generally right and seems to work" position.
- it’s not just X, it’s Y
- emdashes everywhere
The problem is that AI needs to be spoon-fed overly detailed dos and donts, and even then the output can't be trusted without carefully checking it. It's easy to reach a point where breaking down the problem into pieces small enough for AI to understand takes more work than just writing the code.
AI may save time when it generates the right thing on the first try, but that's a gamble. The code may need multiple rounds of fixups, or end up needing a manual rewrite anyway, after wasting time and effort on instructing the AI. The ceiling of AI capabilities is very uneven and unpredictable.
Even worse, the AI can confidently generate code that looks superficially correct, but has subtle bugs/omissions/misinterpretations that end up costing way more time and effort than the AI saved. It has uncanny ability to write nicely structured, well-commented code that is just wrong.
It's a brave, weird and crazy new world. "The future is now, old man."
I've told the same Claude to write me unit tests for a very well known well-documented API. It was too dumb to deduce what edge cases it should test, so I also had to give it a detailed list of what to test and how. Despite all of that, it still wrote crappy tests that misused the API. It couldn't properly diagnose the failures, and kept adding code for non-existing problems. It was bad at applying fixes even when told exactly what to fix. I've wasted a lot of time cleaning up crappy code and diagnosing AI-made mistakes. It would have been quicker to write it all myself.
I've tried Claude and GPT4o for a task that required translating imperative code that writes structured data to disk field by field into explicit schema definitions. It was an easy, but tedious task (I've had many structs to convert). AI hallucinated a bunch of fields, and got many types wrong, wasting a lot of my time on diagnosing serialization issues. I really wanted it to work, but I've burned over $100 in API credits (not counting subscriptions) trying various editors and approaches. I've wasted time and money managing context for it, to give it enough of the codebase to stop it from hallucinating the missing parts, but also carefully trim it to avoid distracting it or causing rot. It just couldn't do the work precisely. In the end I had scrap it all, and do it by hand myself.
I've tried gpt4o and 4-mini-high to write me a specific image processing operation. They could discuss the problem with seemingly great understanding (referencing academic research, advanced data structures). I even got a Python that had correct syntax on the first try! But the implementation had a fundamental flaw that caused numeric overflows. AI couldn't fix it itself (kept inventing stupid workarounds that didn't work or even defeated the point of the whole algorithm). When told step by step what to do to fix it, it kept breaking other things in the process.
I've tried to make AI upgrade code using an older version of a dependency to a newer one. I've provided it with relevant quotes from the docs (I know it would have been newer than its knowledge cutoff), and even converted parts of the code myself, so it could just follow the pattern. The AI couldn't properly copy-paste code from one function to another. It kept reverting things. When I pointed out the issues, it kept apologising, saying what new APIs it's going to use, and then use the old APIs again!
I've also briefly tried GH copilot, but it acted like level 1 tech support, despite burning tokens of a more capable model.
I was surprised with claude code I was able to get a few complex things done that I had anticipated to be a few weeks to uncover, stitch together and get moving.
Instead I pushed Claude to consistently present the correct udnerstanding of the problem, strucutre, approach to solving things, and only after that was OK, was it allowed to propose changes.
True to it's shiny things corpus, it will over complicate things because it hasn't learned that less is more. Maybe that reflects the corpus of the average code.
Looking at how folks are setting up their claude.md and agents can go a long way if you haven't had a chance yet.
I find it impossible to work out who to trust on the subject, given that I'm not working directly with them, so remain entirely on the fence.
But nobody has ever managed to get there despite decades of research and work done in this area. Look at the work of Gerald Sussman (of SICP fame), for example.
So all you're saying is it makes the easy bit easier if you've already done, and continue to do, the hard bit. This is one of the points made in TFA. You might be able to go 200mph in a straight line, but you always need to slow down for the corners.
What you need is just boring project management. Have a proper spec, architecture and tasks split into manageable chunks with enough information to implement them.
Then you just start watching TV and say "implement github issue #42" to Claude and it'll get on with it.
But if you say "build me facebook" and expect a shippable product, you'll have a bad time.
One thing that AI has helped me with is finding pesky bugs. I mainly work on numerical simulations. At one point I was stuck for almost a week trying to figure out why my simulation was acting so strange. Finally I pulled up chatgpt, put some of my files into the context and wrote a prompt explaining the strange behavior and what I thought might be happening. In a few seconds it figured out that I had improperly scaled one of my equations. It came down to a couple missing parentheses, and once I fixed it the simulation ran perfectly.
This has happened a few times where AI was easily able to see something I was overlooking. Am I a 10x developer now that I use AI? No... but when used well, AI can have a hugely positive impact on what I am able to get done.
It’s a rubber duck that’s pretty educated and talks back.
What I've seen with AI is that it does not save my coworkers from the pain of overcomplicating simple things that they don't really think through clearly. AI does not seem to solve this.
Using AI will change nothing in this context.
This has never been the case in any company I've ever worked at. Even if you can finish your day's work in, say, 4 hours, you can't just dip out for the other 4 hours of the day.
Managers and teammates expect you to be available at the drop of a hat for meetings, incidents, random questions, "emergencies", etc.
Most jobs I've worked at eventually devolve into something like "Well, I've finished what I wanted to finish today. I could either stare at my monitor for the rest of the day waiting for something to happen, or I could go find some other work to do. Guess I'll go find some other work to do since that's slightly less miserable".
You also have to delicately "hide" the fact that you can finish your work significantly faster than expected. Otherwise the expectations of you change and you just get assigned more work to do.
Literally unwinnable scenarios. Only way to succeed is to just sit your ass in the chair. Almost no manager actually cares about your actual output - they all care about presentation and appearances.
Uh, no?
I had a task to do a semi-complex UI addition, the whole week was allocated for that.
I sicked the corp approved Github Copilot with 4o and Claude 3.7 at it and it was done in an afternoon. It's ~95% functionally complete, but ugly as sin. (The model didn't understand our specific Tailwind classes)
Now I can spend the rest of the week on polish.
So always aim for outcomes, not output :)
At my company, we did promote people quickly enough that they are now close to double their salaries when they started a year or so ago, due to their added value as engineers in the team. It gets tougher as they get into senior roles, but even there, there's quite a bit of room for differentiation.
Additionally, since this is a market, you should not even expect to be paid twice for 2x value provided — then it makes no difference to a company if they get two 1x engineers instead, and you are really not that special if you are double the cost. So really, the "fair" value is somewhere in between: 1.5x to equally reward both parties, or leaning one way or the other :)
The thing is that company is hunting for better value, and you are looking for a better deal.
If company can get 2x engineers' production at lower cost, you are only more valuable than having 2 engineers producing as much if you are cheaper. Your added value is this extra 1x production, but if you are selling "that" at the same price, they are just as well off by hiring two engineers instead of you: there is no benefit to having you over them.
If you can do it cheaper, then you are more valuable the cheaper you are. Which is why I said 1.5x cost is splitting the value/cost between you and the employer.
You can certainly be very productive by doing what you are told. I'd probably fail at that metric against many engineers, yet people usually found me very valuable to their teams (I never asked if it was 1x or 2x or 0.5x compared to whatever they perceive as average).
The last few years, I am focused on empowering engineers to be equal partners in deciding what is being done, by teaching them to look for and suggest options which are 10% of the effort and 90% of the value for the user (or 20/80, and sometimes even 1% effort for 300% the value). Because they can best see what is simple and easy to do with the codebase, so if they put customer hat on, they unlock huge wins for their team and business.
100%. The biggest challenge with software is not that it’s too hard to write, but that it’s too easy to write.
Most of the AI productivity stories I hear sound like they're optimizing for the wrong metric. Writing code faster doesn't necessarily mean shipping better products faster. In my experience, the bottleneck is rarely "how quickly can we type characters into an editor" - it's usually clarity around requirements, decision-making overhead, or technical debt from the last time someone optimized for speed over maintainability.
The author mentions that real 10x engineers prevent unnecessary work rather than just code faster. That rings true to me. I've seen more productivity gains from saying "no" to features or talking teams out of premature microservices(or adopting Kafka :D) than from any coding tool.
What worries me more is the team dynamic this creates. When half your engineers feel like they're supposed to be 10x more productive and aren't, that's a morale problem that compounds. The engineers who are getting solid 20-30% gains from AI (which seems realistic) start questioning if they're doing it wrong.
Has anyone actually measured this stuff properly in a production environment with consistent teams over 6+ months? Most of the data I see is either anecdotal or from artificial coding challenges.
You are right that typing speed isn't the bottleneck, but wrong about what AI actually accelerates. The 10x engineers aren't typing faster they're exploring 10 different architectural approaches in the time it used to take to try one, validating ideas through rapid prototyping, automating the boring parts to focus on the hard decisions.
You can't evaluate a small sample size of people who are not exploiting the benefits well and come to an accurate assessment of the utility of a new technology.
Skill is always a factor.
Plus there are use-cases for LLMs that go beyond augmenting your ability to produce code, especially for learning new technologies. The yield depends on the distribution of tasks you have in your role. For example, if you are in lots of meetings, or have lots of administrative overhead to push code, LLMs will help less. (Although I think applying LLMs to pull request workflow, commit cleanup and reordering, will come soon).
That aside: I still think complaining about "hallucination" is a pretty big "tell".
The conversation around LLMs is so polarized. Either they’re dismissed as entirely useless, or they’re framed as an imminent replacement for software developers altogether.
Hallucinations are worth talking about! Just yesterday, for example, Claude 4 Sonnet confidently told me Godbolt was wrong wrt how clang would compile something (it wasn’t). That doesn’t mean I didn’t benefit heavily from the session, just that it’s not a replacement for your own critical thinking.
Like any transformative tool, LLMs can offer a major productivity boost but only if the user can be realistic about the outcome. Hallucinations are real and a reason to be skeptical about what you get back; they don’t make LLMs useless.
To be clear, I’m not suggesting you specifically are blind to this fact. But sometimes it’s warranted to complain about hallucinations!
Usually, such a loop just works. In the cases where it doesn't, often it's because the LLM decided that it would be convenient if some method existed, and therefore that method exists, and then the LLM tries to call that method and fails in the linting step, decides that it is the linter that is wrong, and changes the linter configuration (or fails in the test step, and updates the tests). If in this loop I automatically revert all test and linter config changes before running tests, the LLM will receive the test output and report that the tests passed, and end the loop if it has control (or get caught in a failure spiral if the scaffold automatically continues until tests pass).
It's not an extremely common failure mode, as it generally only happens when you give the LLM a problem where it's both automatically verifiable and too hard for that LLM. But it does happen, and I do think "hallucination" is an adequate term for the phenomenon (though perhaps "confabulation" would be better).
Aside:
> I can't imagine an agent being given permission to iterate Terraform
Localstack is great and I have absolutely given an LLM free rein over terraform config pointed at localstack. It has generally worked fine and written the same tf I would have written, but much faster.
Anyway, I still see hallucinations in all languages, even javascript, attempting to use libraries or APIs that do not exist. Could you elaborate on how you have solved this problem?
Gemini CLI (it's free and I'm cheap) will run the build process after making changes. If an error occurs, it will interpret it and fix it. That will take care of it using functions that don't exist.
I can get stuck in a loop, but in general it'll get somewhere.
It's a pretty obvious rhetorical tactic: everybody associates "hallucination" with something distinctively weird and bad that LLMs do. Fair enough! But then they smuggle more meaning into the word, so that any time an LLM produces anything imperfect, it has "hallucinated". No. "Hallucination" means that an LLM has produced code that calls into nonexistent APIs. Compilers can and do in fact foreclose on that problem.
If, according to you, LLMs are so good at avoiding hallucinations these days, then maybe we should ask an LLM what hallucinations are. Claude, "in the context of generative AI, what is a hallucination?"
Claude responds with a much broader definition of the term than you have imagined -- one that matches my experiences with the term. (It also seemingly matches many other people's experiences; even you admit that "everybody" associates hallucination with imperfection or inaccuracy.)
Claude's full response:
"In generative AI, a hallucination refers to when an AI model generates information that appears plausible and confident but is actually incorrect, fabricated, or not grounded in its training data or the provided context.
"There are several types of hallucinations:
"Factual hallucinations - The model states false information as if it were true, such as claiming a historical event happened on the wrong date or attributing a quote to the wrong person.
"Source hallucinations - The model cites non-existent sources, papers, or references that sound legitimate but don't actually exist.
"Contextual hallucinations - The model generates content that contradicts or ignores information provided in the conversation or prompt.
"Logical hallucinations - The model makes reasoning errors or draws conclusions that don't follow from the premises.
"Hallucinations occur because language models are trained to predict the most likely next words based on patterns in their training data, rather than to verify factual accuracy. They can generate very convincing-sounding text even when "filling in gaps" with invented information.
"This is why it's important to verify information from AI systems, especially for factual claims, citations, or when accuracy is critical. Many AI systems now include warnings about this limitation and encourage users to double-check important information from authoritative sources."
Right across this thread we have the author of the post saying that when they said "hallucinate", they meant that if they watched they could see their async agent getting caught in loops trying to call nonexistent APIs, failing, and trying again. And? The point isn't that foundation models themselves don't hallucinate; it's that agent systems don't hand off code with hallucinations in it, because they compile before they hand the code off.
To be clear, I did not classify "all the AI-supporters" as being in those three categories, I specifically said the people posting that they are getting 10x improvements thanks to AI.
Can you tell me about what you've done to no longer have any hallucinations? I notice them particularly in a language like Terraform, the LLMs add properties that do not exist. They are less common in languages like Javascript but still happen when you import libraries that are less common (e.g. DrizzleORM).
Your article does not specifically say 10x, but it does say this:
> Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.
> “I’m sipping rocket fuel right now,” a friend tells me. “The folks on my team who aren’t embracing AI? It’s like they’re standing still.” He’s not bullshitting me. He doesn’t work in SFBA. He’s got no reason to lie.
That's not quantifying it specifically enough to say "10x", but it is saying no uncertain terms that AI engineers are moving fast and everyone else is standing still by comparison. Your article was indeed one of the ones I specifically wanted to respond to as the language directly contributed to the anxiety I described here. It made me worry that maybe I was standing still. To me, the engineer you described as sipping rocket fuel is an example both of the "degrees of separation" concept (it confuses me you are pointing to a third party and saying they are trustworthy, why not simply describe your workflow?), and the idea that a quick burst of productivity can feel huge but it just doesn't scale in my experience.
Again, can you tell me about what you've done to no longer have any hallucinations? I'm fully open to learning here. As I stated in the article, I did my best to give full AI agent coding a try, I'm open to being proven wrong and adjusting my approach.
I _never_ made the claim that you could call that 10x productivity improvement. I’m hesitant to categorize productivity in software in numeric terms as it’s such a nuanced concept.
But I’ll stand by my impression that a developer using ai tools will generate code at a perceptibly faster pace than one who isn’t.
I mentioned in another comment the major flaw in your productivity calculation, is that you aren’t accounting for the work that wouldn’t have gotten done otherwise. That’s where my improvements are almost universally coming from. I can improve the codebase in ways that weren’t justifiable before in places that do not suffer from the coordination costs you rightly point out.
I no longer feel like my peers are standing still, because they’ve nearly uniformly adopted ai tools. And again, you rightly point out, there isn’t much of a learning curve. If you could develop before them you can figure out how to improve with them. I found it easier than learning vim.
As for hallucinations I don’t experience them effectively _ever_. And I do let agents mess with terraform code (in code bases where I can prevent state manipulation or infrastructure changes outside of the agents control).
I don’t have any hints on how. I’m using a pretty vanilla Claude code setup. But im not sure how an agent that can write and run compile/test loops could hallucinate.
> I mentioned in another comment the major flaw in your productivity calculation, is that you aren’t accounting for the work that wouldn’t have gotten done otherwise. That’s where my improvements are almost universally coming from. I can improve the codebase in ways that weren’t justifiable before in places that do not suffer from the coordination costs you rightly point out.
I'm a bit confused by this. There is work that apparently is unlocking big productivity boosts but was somehow not justified before? Are you referring to places like my ESLint rule example, where eliminating the startup costs of learning how to write one allows you to do things you wouldn't have previously bothered with? If so, I feel like I covered this pretty well in the article and we probably largely agree on the value that productivity boost. My point is still stands that that doesn't scale. If this is not what you mean, feel free to correct me.
Appreciate your thoughts on hallucinations. My guess is the difference between what we're experiencing is that in your code hallucinations are still happening but getting corrected after tests are run, whereas my agents typically get stuck in these write-and-test loops and can't figure out how to solve the problem, or it "solves" it by deleting the tests or something like that. I've seen videos and viewed open source AI PRs which end up in similar loops as to what I've experienced, so I think what I see is common.
Perhaps that's an indication of that we're trying to solve different problems with agents, or using different languages/libraries, and that explains the divergence of experiences. Either way, I still contend that this kind of productivity boost is likely going to be hard to scale and will get tougher to realize as time goes on. If you keep seeing it, I'd really love to hear more about your methods to see what I'm missing. One thing that has been frustrating me is that people rarely share their workflows after makign big claims. This is unlike previous hype cycles where people would share descriptions of exactly what they did ("we rewrote in Rust, here's how we did it", etc.) Feel free to email me at the address in my about page[1] or send me a request on LinkedIn or whatever. I'm being 100% genuine that I'd love to learn from you!
This maybe a definition problem then. I don’t think “the agent did a dumb thing that it can’t reason out of” is a hallucination. To me a hallucination is a pretty specific failure mode, it invents something that doesn’t exist. Models still do that for me but the build test loop sets them aright on that nearly perfectly. So I guess the model is still hallucinating but the agent isn’t so the output is unimpacted. So I don’t care.
For the agent is dumb scenario, I aggressively delete and reprompt. This is something I’ve actually gotten much better at with time and experience, both so it doesn’t happen often and I can course correct quickly. I find it works nearly as well for teaching me about the problem domain as my own mistakes do but is much faster to get to.
But if I were going to be pithy. Aggressively deleting work output from an agent is part of their value proposition. They don’t get offended and they don’t need explanations why. Of course they don’t learn well either, that’s on you.
Deleting and re-prompting is fine. I do that too. But even one cycle of that often means the whole prompting exercise takes me longer than if I just wrote the code myself.
A lot of the advantage is that it can make forward progress when I can’t. I can check to see if an agent is stuck, and sometimes reprompt it, in the downtime between meetings or after lunch before I start whatever deep thinking session I need to do. That’s pure time recovered for me. I wouldn’t have finished _any_ work with that time previously.
I don’t need to optimize my time around babysitting the agent. I can do that in the margins. Watching the agents is low context work. That adds the capability to generate working solutions during times that was previously barred from that.
Either way, I'm happy that you are getting so much out of the tools. Perhaps I need to prompt harder, or the codebase I work on has just deviated too much from the stuff the LLMs like and simply isn't a good candidate. Either way, appreciate talking to you!
Good luck ever getting that. I've asked that about a dozen times on here from people making these claims and have never received a response. And I'm genuinely curious as well, so I will continue asking.
What people aren't doing is proving to you that their workflows work as well as they say they do. You want proof, you can DM people for their rate card and see what that costs.
> As of March, 2025, this library is very new, prerelease software.
I'm not looking for personal proof that their workflows work as well as they say they do.
I just want an example of a project in production with active users depending on the service for business functions that has been written 1.5/2/5/10/whatever x faster than it otherwise would have without AI.
Anyone can vibe code a side project with 10 users or a demo meant to generate hype/sales interest. But I want someone to actually have put their money where their mouth is and give an example of a project that would have legal, security, or monetary consequences if bad code was put in production. Because those are the types of projects that matter to me when trying to evaluate people's claims (since those are what my paycheck actually depends on).
Do you have any examples like that?
That code tptacek linked you to? It's part of our (Cloudflare's) MCP framework. Which means all of the companies mentioned in this blog post are using this code in production today: https://blog.cloudflare.com/mcp-demo-day/
There you go. This is what you are looking for. Why are you refusing to believe it?
(OK fine. I guess I should probably update the readme to remove that "prerelease" line.)
I never look at my own readmes so they tend to get outdated. :/
Fixing: https://github.com/cloudflare/workers-oauth-provider/pull/59
1. Would have legal, security, or monetary consequences if bad code was put in production
2. Was developed using an AI/LLM/Agent/etc that made the development many times faster than it otherwise would have (as so many people claim)
I would love to hear an example where "I used Claude to develop this hosting/ecommerce/analytics/inventory management service that is used in production by 50 paying companies. Using an LLM we deployed the project in 4 week where it would normally take us 4 months." Or "We updated an out of date code base for a client in half the time it would normally take and have not seen any issues since launch"
At the end of the day I code to get paid. And it would really help to be able to point to actual cases where both money and negative consequences of failure are on the line.
So if you have any examples please share. But the more people deflect the more skeptical I get about their claims.
I mean it's pretty simple - there are a lot of big claims that I read but very few tangible examples that people share where the project has consequences for failure. Someone else replied with some helpful examples in another thread. If you want to add another one feel free, if not that's cool too.
At some point you have to accept that no amount of proof will convince someone that refuses to be swayed. It's very frustrating because, while these are wonderful tools already, its clear that the biggest thing that makes a positive difference is people using and improving them. They're still in relative infancy.
I want to have the kind of conversations we had back at the beginning of web development, when people were delighted at what was possible despite everything being relatively awful.
Since my day job is creating systems that need to be operational and predictable for paying clients - examples of front end mockups, demos, apps with no users, etc don't really matter that much at the end of the day. It's like the difference between being a great speaker in a group of 3 friends vs standing up in front of a 30 person audience with your job on the line.
If you have some examples, I'd love to hear about them because I am genuinely curious.
I spent probably a day building prompts and tests and getting an example of failing behavior in Python, and then I wrote pseudocode and had it implement and write comprehensive unit tests in rust. About three passes and manual review of every line. I also have an MCP that calls out to O3 as a second opinion code review and passes it back in
Very fun stuff
I rolled out a PR that was a one shot change to our fundamental storage layer on our hot path yesterday. This was part of a large codebase and that file has existed for four years. It hadn’t been touched in 2. I literally didn’t touch a text editor on that change.
I have first hand experience watching devs do this with payment processing code that handles over a billion dollars on a given day.
When you say you didn't touch a text editor, do you mean you didn't review the code change or did you just look at the diff in the terminal/git?
Because I was the instigator of that change a second code owner was required to approve the PR as well. That PR didn't require any changes, which is uncommon but not particularly rare.
It is _common_ for me to only give feedback to the agents via the GitHub gui the same way I do humans. Occasionally I have to pull the PR down locally and use the full powers of my dev environment to review but I don't think that is any more common than with people. If anything its less common because of the tasks the agents get typically they either do well or I kill the PR without much review.
And this is the problem.
Masterful developers are the ones you pay to reduce lines of code, not create them.
Perhaps, start from the assumption that I have in fact spent a fair bit of time doing this job at a high level. Where does that mental exercise take you with regard to your own position on ai tools.
In fact, you don’t have to assume I’m qualified to speak on the subject. Your retort assumes that _everyone_ who gets improvement is bad at this. Assume any random proponent isn’t.
One of the most valuable qualities of humans is laziness.
We're constantly seeking efficiency gains, because who wants to carry buckets of water, or take laundry down to the river?
Skilled developers excel at this. They are "lazy" when they code - they plan for the future, they construct code in a way that will make their life better, and easier.
LLMs don't have this motivation. They will gleefully spit out 1000 lines of code when 10 will do.
It's a fundamental flaw.
I distinctly did not say that. I said your article was one of the ones that made me feel anxious. And it's one of the ones that spurred me to write this article. I demonstrated how your language implies a massive productivity boost from AI. Does it not? Is this not the entire point of what you wrote? That engineers who aren't using AI are crazy (literally the title) because they are missing out on all this "rocket fuel" productivity? The difference between rocket fuel and standing still has to be a pretty big improvement.
The points I make here still apply, there is not some secret well of super-productivity sitting out in the open that luddites are just too grumpy to pick up and use. Those who feel they have gotten massive productivity boosts are being tricked by occasional, rare boosts in productivity.
You said you solved hallucinations, could you share some of how you did that?
I'm trying to write a piece to comfort those that feel anxious about the wave of articles telling them they aren't good enough, that they are "standing still", as you say in your article. That they are crazy. Your article may not say the word 10x, but it makes something extremely clear: you believe some developers are sitting still and others are sipping rocket fuel. You believe AI skeptics are crazy. Thus, your article is extremely natural to cite when talking about the origin of this post.
You can keep being mad at me for not providing a detailed target list, I said several times that that's not what the point of this is. You can keep refusing to actually elaborate on how you use AI day to day and solve its problems. That's fine. I don't care. I care a lot more to talk about the people who are actually engaging with me (such as your friend) and helping me to understand what they are doing. Right now, if you're going to keep not actually contributing to the conversation, you're just kinda being a salty guy with an almost unfathomable 408,000 karma going through every HN thread every single day and making hot takes.
The article in question[0] has the literal tag line:
> My AI Skeptic Friends Are All Nuts
how much saner is someone who isn't nuts to someone who is nuts? 10x saner? What do the specific numbers matter given you're not writing a paper?
You're enjoying the click bait benefits of using strong language and then acting offended when someone calls you out on it. Yes, maybe you didn't literally say "10x" but you said or quoted things in exactly that same ballpark and its worthy of a counter point like the OP has provided. They're both interesting articles with strong opinions that make the world a more interesting place so idk why you're trying to disown the strength with which you wrote your article.
I'm not offended at all. I'm saying: no, I'm not a valid cite for that idea. If the author wants to come back and say "10x developer", a term they used twenty five times in this piece, was just a rhetorical flourish, something they conjured up themselves in their head, that's great! That would resolve this small dispute neatly. Unfortunately: you can't speak for them.
They used it 25 times in their piece and in your piece stated that being interested in "the craft" is something people should do in their own time from now on. Strongly implying, if not outright stating; that the processes and practices we've refined for the past 70 years of software engineering need to move aside for the next hotness that has only been out for 6 months. Sure you never said "10x", but to me it read entirely like you're doing the "10x" dance. It was a good article and it definitely has inspired me to check it out.
However there is a bit of irony in that you're happy to point out my defensiveness as a potential flaw when you're getting hung up on nailing down the "10x" claim with precision. As an enjoyer of both articles I think this one is a fair retort to yours, so I think it a little disappointing to get distracted by the specifics.
If only we could accurately measure 1x developer productivity, I imagine the truth might be a lot clearer.
You're rebutting a claim about your rant that -if it ever did exist- has been backed away from and disowned several times.
From [0]
> > Wait, now you're saying I set the 10x bar? No, I did not.
>
> I distinctly did not say that. I said your article was one of the ones that made me feel anxious. And it's one of the ones that spurred me to write this article.
and from [1]
> I'm trying to write a piece to comfort those that feel anxious about the wave of articles telling them they aren't good enough, that they are "standing still", as you say in your article. That they are crazy. Your article may not say the word 10x, but it makes something extremely clear: you believe some developers are sitting still and others are sipping rocket fuel. You believe AI skeptics are crazy. Thus, your article is extremely natural to cite when talking about the origin of this post.
My post is about how those types of claims are unfounded and make people feel anxious unnecessarily. He just doesn't want to confront that he wrote an article that directly says these words and that those words have an effect. He wants to use strong language without any consequences. So he's trying to nitpick the things I say and ignore my requests for further information. It's kinda sad to watch, honestly.
Speaking of his rant, in it, he says this:
> [Google's] Gemini’s [programming skill] floor is higher than my own.
which, man... if that's not hyperbole, either he hasn't had much experience with the worst Gemini has to offer, or something really bad has happened to him. Gemini's floor is "entirely-gormless junior programmer". If a guy who's been consistently shipping production software since the mid-1990s isn't consistently better than that, something is dreadfully wrong.
That seemed to me be to be the author's point.
His article resonated with me. After 30 years of development and dealing with hype cycles, offshoring, no-code "platforms", endless framework churn (this next version will make everything better!), coder tribes ("if you don't do typescript, you're incompetent and should be fired"), endless bickering, improper tech adopting following the FANGs (your startup with 0 users needs kubernetes?) and a gazillion other annoyances we're all familiar with, this AI stuff might be the thing that makes me retire.
To be clear: it's not AI that I have a problem with. I'm actually deeply interested in it and actively researching it from a math's up approach.
I'm also a big believer in it, I've implemented it in a few different projects that have had remarkable efficiency gains for my users, things like automatically extracting values from a PDF to create a structured record. It is a wonderful way to eliminate a whole class of drudgery based tasks.
No, the thing that has me on the verge of throwing in the towel is the wholesale rush towards devaluing human expertise.
I'm not just talking about developers, I'm talking about healthcare providers, artists, lawyers, etc...
Highly skilled professionals that have, in some cases, spent their entire lives developing mastery of their craft. They demand a compensation rate commensurate to that value, and in response society gleefully says "meh, I think you can be replaced with this gizmo for a fraction of the cost."
It's an insult. It would be one thing if it were true - my objection could safely be dismissed as the grumbling of a buggy whip manufacturer, however this is objectively, measurably wrong.
Most of the energy of the people pushing the AI hype goes towards obscuring this. When objective reality is presented to them in irrefutable ways, the response is inevitably: "but the next version will!"
It won't. Not with the current approach. The stochastic parrot will never learn to think.
That doesn't mean it's not useful. It demonstrably is, it's an incredibly valuable tool for entire classes of problems, but using it as a cheap replacement for skilled professionals is madness.
What will the world be left with when we drive those professionals out?
Do you want an AI deciding your healthcare? Do you want a codebase that you've invested your life savings into written by an AI that can't think?
How will we innovate? Who will be able to do fundamental research and create new things? Why would you bother going into the profession at all? So we're left with AIs training on increasingly polluted data, and relying on them to push us forward. It's a farce.
I've been seriously considering hanging up my spurs and munching popcorn through the inevitable chaos that will come if we don't course correct.
And I think that sentence is a pretty big tell, so ...
https://www.windowscentral.com/software-apps/sam-altman-ai-w...
https://brianchristner.io/how-cursor-ai-can-make-developers-...
https://thenewstack.io/the-future-belongs-to-ai-augmented-10...
So it's not like I'm delivering features in one day that would have taken two weeks. But I am delivering features in two weeks that have a bunch of extra niceties attached to them. Reality being what it is, we often release things before they are perfect. Now things are a bit closer to perfect when they are released.
I hope some of that extra work that's done reduces future bug-finding sessions.
What I'm about to discuss is about me, not you. I have no idea what kind of systems you build, what your codebase looks like, use case, business requirements etc. etc. etc. So it is possible writing tests is a great application for LLMs for you.
In my day to day work... I wish that developers where I work would stop using LLMs to write tests.
The most typical problem with LLM-generated tests on the codebase where I work is that the test code is almost extremely tightly coupled to the implementation code. Heavy use of test spies is a common anti-pattern. The result is a test suite that is testing implementation details, rather than "user-facing" behaviour (user could be a code-level consumer of the thing you are testing).
The problem with that type of a test is that is a fragile test. One of the key benefits of automated tests is that they give you a safety net to refactor implementation to your heart's content without fear of having broken something. If you change an implementation detail, and the "user-facing" behaviour does not change, your tests should pass. When tests are tightly coupled to implementation, they will fail and now your tests, in the worst of cases, might actually be creating negative value for you ... since you every code change now requires you to keep tests up to date even when what you actually care about testing "is this thing working correctly?" hasn't changed.
The root of this problem isn't even the LLM, it's just that the LLM makes it a million times worse. Developers often feel like writing tests are a menial chore that needs to be done after the fact to satisfy code coverage policy. Few developers, at many organizations, have ever truly worked TDD or learned testing best practices, how to write easy to test implementation code etc.
That problem statement is:
- Not all tests add value
- Some tests can even create dis-value (ex: slow to run, thus increasing CI bills for the business without actually testing anything important)
- Few developers understand what good automated testing looks like
- Developers are incentivized to write tests just to satisfy code coverage metrics
- Therefore writing tests is a chore and an afterthought
- So they reach for an LLM because it solves what they perceive as a problem
- The tests run and pass, and they are completely oblivious to the anti-patterns just introduced and the problems those will create over time
- The LLMs are generating hundreds, if not thousands, of these problems
So yeah, the problem is 100% the developers who don't understand how to evaluate the output of a tool that they are using.
But unlike functional code, these tests are - in many cases - arguably creating disvalue for the business. At least the functional code is a) more likely to be reviewed and code quality problems addressed and b) even if not, it's still providing features for the end user and thus adding some value.
Forcing the discussion of invariants, and property-based testing -- seems to improve on the issues you're mentioning (when using e.g. Opus 4), especially when combined with the "use the public API" or interface abstractions.
For much of what I build with AI, I'm not saving two weeks. I'm saving infinity weeks — if LLMs didn't exist I would have never built this tool in the first place.
Now let's say you use Claude code, or whatever, and you're able to create the same web app over a weekend. You spend 6 hours a day on Saturday and Sunday, in total 12 hours.
That's 10x increase in productivity right there. Did it make you a 10x better programmer? Nope, probably not. But your productivity went up by a tenfold.
And at least to me, that's sort of how it has worked. Things I didn't have motivation or energy to get into before, I can get into over a weekend.
For me it's 50-50 reading other people's code and getting a feel for the patterns and actually writing the code.
So no, imho people with no app dev skills cannot just build something over a weekend, at least something that won‘t break when the first user logs in.
The article relates about actual, experienced engineers trying to get even better. That's a completely different matter.
That being said, I am a generalist with 10+ years of experience and can spot the good parts from bad parts and can wear many hats. Sure, I do not know everything, but, hey did I know everything when AI was not there? I took help from SO, Reddit and other places. Now, I go to AI, see if it makes sense, apply the fix, learn and move on.
However most paid jobs don't fall into this category.
Overall it feels negligible too me in its current state.
Things like: build a settings system with org, user, and project level settings, and the UI to edit them.
A task like that doesn’t require a lot of thinking and planning, and is well within most developers’ abilities, but it can still take significant time. Maybe you need to create like 10 new files across backend and frontend, choose a couple libraries to help with different aspects, style components for the UI and spend some time getting the UX smooth, make some changes to the webpack config, and so on. None of it is difficult, per se, but it all takes time, and you can run into little problems along the way.
A task like that is like 10-20% planning, and 80-90% going through the motions to implement a lot of unoriginal functionality. In my experience, these kinds of tasks are very common, and the speedup LLMs can bring to them, when prompted well, is pretty dramatic.
This is where I have found LLMs to be most useful. I have never been able to figure out how to get it to write code that isn't a complete unusable disaster zone. But if you throw your problem at it, it can offer great direction in plain English.
I have decades of research, planning, and figuring things out under my belt, though. That may give me an advantage in guiding it just the right way, whereas the junior might not be able to get anything practical from it, and thus that might explain their focus on code generation instead?
It's not a ground-breaking app, its CRUD and background jobs and CSV/XLSX exports and reporting, but I found that I was able to "wireframe" with real code and thus come up with unanswered questions, new requirements, etc. extremely early in the project.
Does that make me a 10x engineer? Idk. If I wasn't confident working with CC, I would have pushed back on the project in the first place unless management was willing to devote significant resources to this. I.e. "is this really a P1 project or just a nice to have?" If these tools didn't exist I would have written spec's and excalidraw or Sketch/Figma wireframes that would have taken me at least the same amount of time or more, but there'd be less functional code for my team to use as a resource.
It reads like this project would have taken your company 9 weeks before, and now will take the company 9 weeks.
Except it also blurs the lines and sets incorrect expectations.
Management often see code being developed quickly (without full understanding of the fine line between PoC and production ready) and soon they expect it to be done with CC in 1/2 the time or less.
Figma on the other hand makes it very clear it is not code.
I sort of want to get back to that... it was really good at getting ideas across.
1. googling stuff about how APIs work 2. writing boilerplate 3. typing syntax correctly
These three things combined make up a huge amount of programming. But when real cognition is required I find I'm still thinking just as hard in basically the same ways I've always thought about programming: identifying appropriate abstractions, minimizing dependencies between things, pulling pieces together towards a long term goal. As far as I can tell, AI still isn't really capable of helping much with this. It can even get in the way, because writing a lot of code before key abstractions are clearly understood can be counterproductive and AI tends to have a monolithic rather than decoupled understanding of how to program. But if you use it right it can make certain tasks less boring and maybe a little faster.
This is not to disagree with the OP, but to point out that, even for engineers, the speedups might not appear where you expect. [EDIT I see like 4 other comments making the same point :)]
What makes an excellent engineer is risk mitigation and designing systems under a variety of possible constraints. This design is performed using models of the domains involved and understanding when and where these models hold and where they break down. There's no "10x". There is just being accountable for designing excellent systems to perform as desired.
If there were a "10x" software engineer, such an engineer would prevent data breaches from occurring, which is a common failure mode in software to the detriment of society. I want to see 10x less of that.
>What makes an excellent engineer is risk mitigation and designing systems under a variety of possible constraints.
I take it that those fields also don't live by the "move fast and break things" motto?
It's like discussing in a gaming guild how to reach the next level. It isn't real.
1. Tech Company's should be able to accelerate and supplant the FAANGs of this world. Like even if 10x was discounted to 5x. It would mean that 10 human years of work would be shrunk down to 2 to make multi-billion dollar companies. This is not happening right now. If this does not start happening with the current series of model, murphy's law (e.g. interest rate spike at some point) or just damn show me the money brutal questions would tell people if it is "working".
2. I think Anthropic's honcho did a back of the envelope number of 600$ for every human in the US(I think just it was just the US) was necessary to justify Nvidia's market Cap. This should play out by the end of this year or in Q3 report.
Then it came time to make a change to one of the charts. Team members were asking me questions about it. "How can we make this axis display only for existing data rather than range?" I'm scrolling through code in a screenshare that I absolutely reviewed, I remember doing it, I remember clicking the green arrow in Cursor, but I'm panicking because this doesn't look like code I've ever seen, and I'm seeing gaping mistakes and stupid patterns and a ton of duplicated code. Yeah I reviewed it, but bit by bit, never really all at once. I'd never grocked the entire file. They're asking me questions to which I don't have answers, for code "I'd just written." Man it was embarrassing!
And then to make the change, the AI completely failed at it. Plotly.js's type definitions are super out of date and the Python library is more fleshed out, so the AI started hallucinating things that exist on Python and not in JS - so now I gotta head to the docs anyway. I had to get much more manual, and the autocomplete of cursor was nice while doing so, but sometimes I'd spend more time tab/backspacing after realizing the thing it recommended was actually wrong, than I'd have spent just quickly typing the entire whatever thing.
And just like a hit, now I'm chasing the dragon. I'd love to get that feeling back of entering a new era of programming, where I'm hugely augmented. I'm trying out all the different AI tools, and desperately wishing there was an autocomplete as fast and multi-line and as good as jumping around as Cursor, available in nvim. But they all let me down. Now that I'm paying more attention, I'm realizing the code really isn't good at all. I think it's still very useful to have Claude generate a lot of boilerplate, or come in and make some tedious changes for me, or just write all my tests, but beyond that, I don't know. I think it's improved my productivity maybe 20%, all things considered. Still amazing! I just wish it was good as I thought it was when I first tried it.
So true, a lot of value and gains are had when tech leads can effectively negotiate and creatively offer less costly solutions to all aspects of a feature.
The co-founder of a company I worked at was one for a period (he is not a 10xer anymore - I don't think someone can maintain that output forever with life constraints). He literally wrote the bulk of a multi-million line system, most of the code is still running today without much change and powering a unicorn level business.
I literally wouldn't believe it, but I was there for it when it happened.
Ran into one more who I thought might be one, but he left the company too early to really tell.
I don't think AI is going to produce any 10x engineers because what made that co-founder so great was he had some kind of sixth sense for architecture, that for most of us mortals we need to take more time or learn by trial and error how to do. For him, he was just writing code and writing code and it came out right on the first try, so to speak. Truly something unique. AI can produce well specified code, but it can't do the specifying very well today, and it can't reason about large architectures and keep that reasoning in its context through the implementation of hundreds of features.
I've been a bit of that engineer (though not at the same scale), like say wrote 70% of a 50k+ loc greenfield service. But I'm not sure it really means I'm 10x. Sometime this comes from just being the person allowed to do it, that doesn't get questioned in it's design choices, decisions of how to structure and write the code, that doesn't get any push back on having massive PRs where others almost just paper stamp it.
And you can really only do this at the greenfield phase, when things are not yet in production, and there's so much baseline stuff that's needed in the code.
But it ends up being the 80/20 rule, I did the 80% of the work in 20% of the time it'll take to go to prod, because that 20% remaining will eat up 80% of the time.
One of our EMs did this this week. He did a lot of homework: spoke to quite a few experts and pretty soon realised this task was too hard for his team to ever accomplish, if it was even possible. Lobbied the PM and, a VP and a C-level, but managed to stop a lot of wasted work from being done.
Sometimes the most important language to know as a dev is English*
s/English/YourLanguageOfChoice/g
What's your experience? And what do the "kids" use these days to indicate alternative options (as above — though for that, I use bash {} syntax too) or to signal "I changed my mind" or "let me fix that for you"?
They could have just said "the most important language [...] is spoken language".
I am curious if this is still understandable in wider software engineering circles, esp outside the HN and Linux bubbles.
I guess this leaves open question about the distribution of productivity across programmers and the difference between the min and the mean. Is productivity normally distributed? Log normal? Some kind of power law?
Junior: 100 total lines of code a day
Senior: 10,000 total lines of code a day
Guru: -100 total lines of code a day
As in, it's now completely preventing you from doing things you could have before?
[In fact you can sometimes find that 10x bigger diff leads to decreased productivity down the line...]
With enough rules and good prompting this is not true. The code I generate is usually better than what I'd do by hand.
The reason the code is better all the extra polish and gold plating is essentially free.
Everything I generate comes out commented great error handling, logging, SOLID, and united tested using established patterns in the code base.
I'm always baffled by this. If you can't do it that well by hand, how can you discriminate its quality so confidently?
I get there is a artist/art consumer analogy to be made (i.e. you can see a piece is good without knowing how to paint), but I'm not convinced it is transferrable to code.
Also, not really my experience when dealing with IaC or (complex) data related code.
Related - agentic LLMs may be slow to produce output but they are parallelizable by an individual unlike hand-written work.
With AI the extra quality and polish is basically free and instantaneous.
Point still remains for junior and semi-senior devs though, or any dev trying to leap over a knowledge barrier with LLMs. Emphasis on good pipelines and human (eventually maybe also LLM based) peer-reviews will be very important in the years to come.
Well-written bullshit in perfect prose is still bullshit.
I use "tab-tab" auto complete to speed through refactorings and adding new fields / plumbing.
It's easily a 3x productivity gain. On a good day it might be 10x.
It gets me through boring tedium. It gets strings and method names right for languages that aren't statically typed. For languages that are statically typed, it's still better than the best IDE AST understanding.
It won't replace the design and engineering work I do to scope out active-active systems of record, but it'll help me when time comes to build.
The 5% is an increase in straight-ahead code speed. I spend a small fraction of my time typing code. Smaller than I'd like.
And it very well might be an economically rational subscription. For me personally, I'm subscription averse based on the overhead of remembering that I have a subscription and managing it.
This is emphatically NOT my experience with a large C++ codebase.
It expands match blocks against highly complex enums from different crates, then tab completes test cases after I write the first one. Sometimes even before that.
Just by virtue of Rust being relatively short-lived I would guess that your code base is modular enough to live inside reasonable context limits, and written following mostly standard practice.
One of the main files I work on is ~40k lines of code, and one of the main proprietary API headers I consume is ~40k lines of code.
My attempts at getting the models available to Copilot to author functions for me have often failed spectacularly - as in I can't even get it to generate edits at prescribed places in the source code, follow examples from prescribed places. And the hallucination issue is EXTREME when trying to use the big C API I alluded to.
That said Claude Code (which I don't have access to at work) has been pretty impressive (although not what I would call "magical") on personal C++ projects. I don't have Opus, though.
Prompts are especially good for building a new template of structure for a new code module or basic boilerplate for some of the more verbose environments. eg. Android Java programming can be a mess, huge amounts of code for something simple like an efficient scrolling view. AI takes care of this - it's obvious code, no thought, but it's still over 100 lines scattered in XML (the view definitions), resources, and in multiple Java files.
Do you really want to be copying boilerplate like this across to many different files? Prompts that are well integrated to the IDE (they give a diff to add the code) are great (also old style Android before Jetpack sucked) https://stackoverflow.com/questions/40584424/simple-android-...
https://github.com/micahscopes/radix_immutable
I took an existing MIT licensed prefix tree crate and had Claude+Gemini rewrite it to support immutable quickly comparable views. The execution took about one day's work, following two or three weeks thinking about the problem part time. I scoured the prefix tree libraries available in rust, as well as the various existing immutable collections libraries and found that nothing like this existed. I wanted O(1) comparable views into a prefix tree. This implementation has decently comprehensive tests and benchmarks.
No code for the next two but definitely results...
Tabu search guided graph layout:
https://bsky.app/profile/micahscopes.bsky.social/post/3luh4d...
https://bsky.app/profile/micahscopes.bsky.social/post/3luh4s...
Fast Gaussian blue noise with wgpu:
https://bsky.app/profile/micahscopes.bsky.social/post/3ls3bz...
In both these examples, I leaned on Claude to set up the boilerplate, the GUI, etc, which gave me more mental budget for playing with the challenging aspects of the problem. For example, the tabu graph layout is inspired by several papers, but I was able to iterate really quickly with claude on new ideas from my own creative imagination with the problem. A few of them actually turned out really well.
(edit)
I asked it to generate a changelog: https://github.com/wglb/gemini-chat/blob/main/CHANGELOG.md
In other words, it matters whether the AI is creating technical debt.
That has nothing to do with AI/LLMs.
If you can't understand what the tool spits out either; learn, throw it away, or get it to make something you can understand.
It's not about lines of code or quality it's about solving a problem. If the problem creates another problem then it's bad code. If it solves the problem without causing that then great. Move onto the next problem.
Weren't there 2 or 3 dating apps that were launched before the "vibecoding" craze that went extremely popular and got extremely hacked weeks/months in? I also distinctly remember a social network having firebase global tokens on the clientside, also a few years ago.
Repeat after me, token prediction is not intelligence.
We went from "this thing is a stochastic parrot that gives you poems and famous people styled text, but not much else" to "here's a fullstack app, it may have some security issues but otherwise it mainly works" in 2.5 years. People expect perfection, and move the goalposts. Give it a second. Learn what it can do today, adapt, prepare for what it can do tomorrow.
LLMs are still stochastic parrots, though highly impressive and occasionally useful ones. LLMs are not going to solve problems like "what is the correct security model for this application given this use case".
AI might get there at some point, but it won't be solely based on LLMs.
Frankly I've seen LLMs answer better than people trained in security theatre so be very careful where you draw the line.
If you're trying to say they struggle with what they've not seen before. Yes, provided that what is new isn't within the phase space they've been trained over. Remember there's no photographs of cats riding dinosaurs but SD models can generate them.
I have experimented with vibe coding. With Claude Code I could produce a useful and usable small React/TS application, but it was hard to maintain and extend beyond a fairly low level of complexity. I totally agree that vibe coding (at the moment) is producing a lot of slop code, I just don't think Tea is an example of it from what I understand.
# loop over the images
for filename in images_filenames:
# download the image
image = download_image(filename)
# resize the image
resize_image(image)
# upload the image
upload_image(image)
I read and understand 100% of the code it outputs, so I'm not so worried about falling too far astray...
being too prescriptive about it (like prompting "don't write comments") makes the output worse in my experience
I prefer to push for self documenting code anyway, never saw the need for docs other than for an API when I'm calling something like a black box.
What is particularly useful is the comments about reasoning about new code added at my request.
How often do you use coding LLMs?
But I have rules that are quite important for successfully completing a task by my standards and it's very frustrating when the LLM randomly ignores them. In a previous comment I explained my experiences in more detail but depending on the circumstances instruction compliance is 9/10 times at best, with some instructions/tasks as poor as 6/10 in the most "demanding" scenarios particularly as the context window fills up during a longer agentic run.
Me: Here's the relevant part of the code, add this simple feature.
Opus: here's the modified code blah blah bs bs
Me: Will this work?
Opus: There's a fundamental flaw in blah bleh bs bs here's the fix, but I only generate part of the code, go hunt for the lines to make the changes yourself.
Me: did you change anything from the original logic?
Opus: I added this part, do you want me to leave it as it was?
Me: closes chat
Coding in a chat interface, and expecting the same results as with dedicated tools is ... 1-1.5 years old at this point. It might work, but your results will be subpar.
These conversations on AI code good, vs AI code bad constantly keep cropping up.
I feel we need to build a cultural norm to share examples places of succeeded, and failures, so that we can get to some sort of comparison and categorization.
The sharing also has to be made non-contentious, so that we get a multitude of examples. Otherwise we’d get nerd-sniped into arguing the specifics of a single case.
Let’s boil this down to an easy set of reproducible steps any engineer can take to wrangle some sense from their AI trip.
It may change in the future, but AI is without a doubt improving our codebase right now. Maybe not 10X but it can easily 2X as long as you actually understand your codebase enough to explain it in writing.
There are atleast 10 posts on HN these days with the same discussion in circle.
1. AI sucks at code
2. you are not using my magic prompting technique
You know like when the loom came out there were probably quite a few models but using it was similar. Like cars are now.
I think its only a matter of time until our roles are commoditized and vibe-coding becomes the norm in most industries.
Vibe coding being a dismissive term on developing a new skillset. For example we'll be doing more planning and testing and such instead of writing code. The same way, say, sysadmins just spin up k8s instead of racking servers or car mechanics read diagnosis codes from readers and, often, just replace an electric part instead of hand-tuning carbs or gapping spark plugs and such. That is to say, a level of skill is being abstracted away.
I think we just have to see this, most likely, as how things will get done going forward.
This reads like empty hype to me, and there's more than one claim like this in these threads, where AI magically creates an app, but any description of the app itself is always conspicuously missing.
I also have never used godot before, and I was surprised at how well it navigated and taught me the interface as well.
At least the horror stories about "all the code is broken and hallucinations" isn't really true for me and my uses so far. If LLM's will succeed anywhere it will be in the overly logical and predictable worlds of programming languages, but that's just a guess on my part, but thus far whenever I reach out for code from LLM's, its been a fairly positive experience.
I do still disagree with your assessment, I think the syntactic tokens in programming languages have a kind of impedance mismatch with the tokens that LLMs, and that the formal semantics of programming languages are a bad fit with the fuzzy statistical LLMs. I firmly believe that increased LLM usage will drive software safety and quality down, simply because a) no semblance of semantic reasoning or formal verification has been applied to the code and b) a software developer will have an incomplete understanding of code not written by themself.
But our opinions can co-exist, good luck in your game development journey!
As far as QA goes, we then circle back to the tool itself being the cure for the problems the tool brings in, which is typical in technology. The same way agile/'break things' programming's solution to QA was to fire the 'hands on' QA department and then programmatically do QA. Mostly for cost savings, but partly because manual QA couldn't keep up.
I think like all artifacts in capitalism, this is 'good enough,' and as such the market will accept it. The same way my laggy buggy Windows computer would be laughable to some in the past. I know if you gave me this Win11 computer when I was big into low-footprint GUI linux desktop, I would have been very unimpressed, but now I'm used to it. Funny enough, I'm migrating back to kubuntu because Windows has become unfun and bloaty and every windows update feels a bit like gambling. But that's me. I'm not the typical market.
I think your concerns are real and correct factually and ideologically, but in terms of a capitalist market will not really matter in the end, and AI code is probably here to stay because it serves the capital owning class (lower labor costs/faster product = more profit for them). How the working class fares or if the consumer product isn't as good as it was will not matter either unless there's a huge pushback, which thus far hasn't happened (coders arent unionizing, consumers seem to accept bloaty buggy software as the norm). If anything the right-wing drift of STEM workers and the 'break things' ideology of development has primed the market for lower-quality AI products and AI-based workforces.
I was a skeptic until I started using the tool and learning how to get good results.
Now I'm a convert. I like sharing my experiences and getting the cope replies like this one.
I believe his original thesis remains true: "There is no single development, in either technology or management technique, which by itself promises even one order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity."
Over the years this has been misrepresented or misinterpreted to suggest it's false but it sure feels like "Agentic Coding" is a single development promising a massive multiplier in improvement that once again is, another accidental tool that can be helpful but is definitely not a silver bullet.
I'm not sure about agentic coding. Need another month at it.
Here's what the 5x to 10x flow looks like:
1. Plan out the tasks (maybe with the help of AI)
2. Open a Git worktree, launch Claude Code in the worktree, give it the task, let it work. It gets instructions to push to a Github pull request when it's done. Claude gets to work. It has access to a whole bunch of local tools, test suites, and lots of documentation.
3. While that terminal is running, I go start more tasks. Ideally there are 3 to 5 tasks running at a time.
4. Periodically check on the tabs to make sure they're not stuck or lost their minds.
5. Finally, review the finished pull requests and merge them when they are ready. If they have issues then go back to the related chat and tell it to work on it some more.
With that flow it's reasonable to merge 10 to 20 pull requests every day. I'm sure someone will respond "oh just because there are a lot of pull requests, doesn't mean you are productive!" I don't know how to prove to you that the PRs are productive other than just say that they are each basically equivalent to what one human does in one small PR.
A few notes about the flow:
- For the AI to work independently, it really needs tasks that are easy to medium difficulty. There are definitely 'hard' tasks that need a lot of human attention in order to get done successfully.
- This does take a lot of initial investment in tooling and documentation. Basically every "best practice" or code pattern that you want to use use in the project must be written down. And the tests must be as extensive as possible.
Anyway the linked article talks about the time it takes to review pull requests. I don't think it needs to take that long, because you can automate a lot..
- Code style issues are fully automated by the linter.
- Other checks like unit test coverage can be checked in the PR as well.
- When you have a ton of automated tests that are checked in the PR, that also reduces how much you need to worry about as a code reviewer.
With all those checks in place, I think it can pretty fast to review a PR. As the human you just need to scan for really bad code patterns, and maybe zoom in on highly critical areas, but most of the code can be eyeballed pretty quickly.
Because I might just not have a great imagination, but it's very hard for me to see how you basically automate the review process on anything that is business critical or has legal risks.
On the security layer, I wrote that code mostly by hand, with some 'pair programming' with Claude to get the Oauth handling working.
When I have the agent working on tasks independently, it's usually working on feature-specific business logic in the API and frontend. For that work it has a lot of standard helper functions to read/write data for the current authenticated user. With that scaffolding it's harder (not impossible) for the bot to mess up.
It's definitely a concern though, I've been brainstorming some creative ways to add extra tests and more auditing to look out for security issues. Overall I think the key for extremely fast development is to have an extremely good testing strategy.
I think where I've become very hesitant is a lot of the programs that I touch has customer data belonging to clients with pretty hard-nosed legal teams. So it's quite difficult for me to imagine not reviewing the production code by hand.
A lot of senior engineers in the big tech companies spend most of their time in meetings. They're still brilliant. For instance, they read papers and map out the core ideas, but they haven't been in the weeds for a long time. They don't necessarily know all the day-to-day stuff anymore.
Things like: which config service is standard now? What's the right Terraform template to use? How do I write that gnarly PromQL query? How do I spin up a new service that talks to 20 different systems? Or in general, how do I map my idea to deployable and testable code in the company's environment?
They used to have to grab a junior engineer to handle all that boilerplate and operational work. Now, they can just use an AI to bridge that gap and build it themselves.
In some cases, LLMs can be a real speed boost. Most of the time, that has to do with writing boilerplate and prototyping a new "thing" I want to try out.
Inevitably, if I like the prototype, I end up re-writing large swaths of it to make it even half way productizable. Fundamentally, LLMs are bad at keeping an end goal in mind while working on a specific feature and it's terrible at holding enough context to avoid code duplication and spaghetti.
I'd like to see them get better and better, but they really are limited to whatever code they can ingest on the internet. A LOT of important code is just not open for consumption in sufficient quantities for it to learn. For this reason, I suspect LLMs will really never be all that good for non-web based engineering. Wheres all the training data gonna come from?
Consider a fully loaded cost of 200k for an engineer or $16,666 per month. They only have to be >1.012x engineer for the "AI" to be worth it. Of course that $200 dollars per month is probably VC subsidized right now but there is lots of money on the table for <2x improvement.
While Gemini performed well in tweaking visualizations (it even understood the output of matplotlib) and responding to direct prompts, it struggled with debugging and multi-step refactorings, occasionally failing with generic error messages. My takeaway is that these tools are incredibly productive for greenfield coding with minimal constraints, but when it comes to making code reusable or architecturally sound, they still require significant human guidance. The AI doesn’t prioritize long-term code quality unless you actively steer it in that direction.
https://arxiv.org/abs/2507.09089
Obviously it depends on what you are using the AI to do, and how good a job you do of creating/providing all the context to give it the best chance of being successful in what you are asking.
Maybe a bit like someone using a leaf blower to blow a couple of leaves back and forth across the driveway for 30 sec rather than just bending down to pick them up.... It seems people find LLMs interesting, and want to report success in using them, so they'll spend a ton of time trying over and over to tweak the context and fix up what the AI generated, then report how great it was, even though it'd have been quicker to do it themselves.
I think agentic AI may also lead to this illusion of, or reported, AI productivity ... you task an agent to do something and it goes off and 30 min later creates what you could have done in 20 min while you are chilling and talking to your workmates about how amazing this new AI is ...
But maybe another thing is not considered - while things may take longer, they ease cognitive load. If you have to write a lot of boilerplate or you have a task to do, but there are too many ways to do it, you can ask AI to play it out for you.
What benefit I can see the most is that I no longer use Google and things like Stack Overflow, but actual books and LLMs instead.
1) The junior developer is able to learn from experience and feedback, and has a whole brain to use for this purpose. You may have to provide multiple pointers, and it may take them a while to settle into the team and get productive, but sooner or later they will get it, and at least provide a workable solution if not what you may have come up with yourself (how much that matters depends on how wisely you've delegated tasks to them). The LLM can't learn from one day to the next - it's groundhog day every day, and if you have to give up with the LLM after 20 attempts it'd be the exact same thing tomorrow if you were so foolish to try again. Companies like Anthropic apparently aren't even addressing the need for continual learning, since they think that a larger context with context compression will work as an alternative, which it won't ... memory isn't the same thing as learning to do a task (learning to predict the actions that will lead to a given outcome).
2) The junior developer, even if they are only marginally useful to begin with, will learn and become proficient, and the next generation of senior developer. It's a good investment training junior developers, both for your own team and for the industry in general.
I worked with many junior developers that didn't learn and kept making the same mistakes and asking the same questions even months in the job.
I found LLMs to be far more advanced in comparison to what I had to deal with.
An LLM is an auto-regressive model - it is trying to predict continuations of training samples purely based on the training samples. It has no idea what were the real-world circumstances of the human who wrote a training sample when they wrote it, or what the real-world consequences were, if any, of them writing it.
For an AI to learn on the job, it would need to learn to predict it's own actions in any specific circumstance (e.g. circumstance = "I'm seeing/experiencing X, and I want to do Y"), based on it's own history of success and failure in similar circumstances... what actions led to a step towards the goal Y? It'd get feedback from the real world, same as we do, and therefore be able to update it's prediction for next time (in effect "that didn't work as expected, so next time I'll try something different", or "cool, that worked, I'll remember that for next time").
Even if a pre-trained LLM/AI did have access to what was in the mind of someone when they wrote a training sample, and what the result of this writing action was, it would not help, since the AI needs to learn how to act based on what is in it's own (ever changing) "mind", which is all it has to go on when selecting an action to take.
The feedback loop is also critical - it's no good just learning what action to take/predict (i.e what actions others took in the training set), unless you also have the feedback loop of what the outcome of that action was, and whether that matches what you predicted to happen. No amount of pre-training can remove the need for continual learning for the AI to correct it's own on-the-job mistakes, and learn from it's own experience.
What about just noticing that coworkers are repeatedly doing something that could easily be automated?
* if my Github actions ran 10x faster, so I don't start reading about "ai" on hackernews while waiting to test my deployment and not noticing the workflow was done an hour ago
* if the Google cloud console deployment page had 1 instead of 10 vertical scroll bars and wasn't so slow and janky in Firefox
* if people started answering my peculiar but well-researched stackoverflow questions instead of nitpicking and discussing whether they belong on superuser vs unix vs ubuntu vs hermeneutics vs serverfault
* if MS Teams died
anyway, nice to see others having the same feeling about llm's
Linear was a very early-stage product I tested a few months after their launch where I was genuinely blown away by the polish and experience relative to their team size. That was in 2020, pre-LLMs.
I have yet to see an equally polished and impressive early-stage product in the past few years, despite claims of 10x productivity.
The credit lies with a more functional style of C++ and typescript (the languages i use for hobbies and work, respectively), but claude has sort of taken me out of the bubble I was brought up in and introduced new ideas to me.
However, I've also noticed that LLM products also tend to reinforce your biases. If you dont ask it to critique you or push back, it often tells you what a great job you did and how incredible your code is. You see this with people who have gotten into a kind of psychotic feedback loop with ChatGPT and who now believe they can escape the matrix.
I think LLMs are powerful, but only for a handful of use cases. I think the majority of what theyre marketed for right now is techno-solutionism and theres an impending collapse in VC funding for companies that are plugging in chatgpt APIs for everything from insurance claims to medical advice
Then unfortunately you're leaving yourself at a serious disadvantage.
Good for you if you're able to live without a calculator, but frankly the automated tool is faster and leaves you less exhausted so you should be taking advantage of it.
I use it similar to the parent poster when I am working with an unfamiliar API, in that I will ask for simple examples of functionality that I can easily verify are correct and then build upon them quickly.
Also, let me know when your calculator regularly hallucinates. I find it exhausting to have an LLM dump out a "finished" implementation and have to spend more time reviewing it than it would take to complete it myself from scratch.
As a junior I used to think it was ok to spend much less time on the review than the writing, but unless the author has diligently detailed their entire process a good review often takes nearly as long. And unsurprisingly enough working with an AI effectively requires that detail in a format the AI can understand (which often takes longer than just doing it).
Yes, if it isn't your being overpaid in the view of a lot of people. Step out of the way and let an expert use the keyboard.
How can you not read and understand code but spend time writing it? That's bad code in that situation.
Source: try working with assembly and binary objects only which really do require working out what's going on. Code is meant to be human readable remember...
For Terraform, specifically, Claude 4 can get thrown into infinite recursive loops trying to solve certain issues within the bounds of the language. Claude still tries to add completely invalid procedures into things like templates.
It does seem to work a bit better for standard application programming tasks.
I wonder if that's all it is, or if the lack of context you mention is a more fundamental issue.
- solo projects
- startups with few engineers doing very little intense code review if any at all
- people who don't know how to code themselves.
Nobody else is realistically able to get 10x multipliers. But that doesn't mean you can't get a 1.5-2x multiplier. I'd say even myself at a large company that moves slow have been able to realize this type of multiplier on my work using cursor/claude code. But as mentioned in the article the real bottleneck becomes processes and reviews. These have not gotten any faster - so in real terms time to ship/deliver isn't much different than before.
The only attempt that we should make at minimizing review times is by making them higher priority than development itself. Technically this should already be the case but in my experience almost no engineer outside of really disciplined companies and not in FAANG actually makes reviews a high priority, because unfortunately code reviews are not usually part of someones performance review and slows down your own projects. And usually your project manager couldn't give two shits about someone elses work being slow.
Processes are where we can make the biggest dent. Most companies as they get large have processes that get in the way of forward velocity. AI first companies will minimize anything that slows time to ship. Companies simply utilizing AI and expecting 10x engineers without actually putting in the work to rally around AI as a first class citizen will fall behind.
Now for senior developers, AI has been tremendous. Example: I'm building a project where I hit the backend in liveview, and internally I have to make N requests to different APIs in parallel and present the results back. My initial version to test the idea had no loading state, waiting for all requests to finish before sending back.
I knew that I could use Phoenix Channels, and Elixir Tasks, and websockets to push the results as they came in. But I didn't want to write all that code. I could already taste it and explain it. Why couldn't I just snap my fingers?
Well AI did just that. I wrote what I wanted in depth, and bada bing, the solution I would have written is there.
Vibe coders are not gonna make it.
Engineers are having the time of their lives. It's freeing!
Also, one underestimated aspect is that LLMs don’t get writer’s block or get tired (so long as you can pay to keep the tokens flowing).
Also, one of the more useful benefits of coding with LLMs is that you are explicitly defining the requirements/specs in English before coding. This effectively means LLM-first code is likely written via Behavior Driven Development, so it is easier to review, troubleshoot, upgrade. This leads to lower total cost of ownership compared to code which is just cowboyed/YOLOed into existence.
Basically, the ability to order my thoughts into a task list long & clear enough for the LLM to follow that I can be working on 3 or so of these in parallel, and maybe email. Any individual run may be faster or slower than I can do it manually, but critically, they take less total human time / attention. No individual technique is fundamentally tricky here, but it is still a real skill.
If you read the article, the author is simply not there, and sees what they know as only 1 weeks worth of knowledge. So for their learning rate .. maybe they need 3x longer of learning & experience?
Internally we expected 15%-25%. A big-3 consultancy told senior leadership "35%-50%" (and then tried to upsell an AI Adoption project). And indeed we are seeing 15%-35% depending on which part of the org you look and how you measure the gains.
https://www.businessinsider.com/ai-coding-tools-may-decrease...
Where I see major productivity gains are on small, tech debt like tasks, that I could not justify before. Things that I can start with an async agent, let sit until I’ve got some downtime on my main tasks (the ones that involve all that coordination). Then I can take the time to clean them up and shepherd them through.
The very best case of these are things where I can move a class of problem from manually verified to automatically verified as that kick starts a virtuous cycle that makes the ai system more productive.
But many of them are boring refactors that are just beyond what a traditional refactoring tool can do.
I doubt that's the commonly desired outcome, but it is what I want! If AI gets too expensive overnight (say 100x), then I'll be able to keep chugging along. I would miss it (claude-code), but I'm betting that by then a second tier AI would fit my process nearly as well.
I think the same class of programmers that yak shave about their editor, will also yak shave about their AI. For me, it's just augmenting how I like to work, which is probably different than most other people like to work. IMO just make it fit your personal work style... although I guess that's problematic for a large team... look, even more reasons not to have a large team!
Now I don't want to sound like a doomsayer but it appears to me that application programming and corresponding software companies are likely to disappear within the next 10 years or so. We're now in a transitional phase were companies who can afford enough AI compute time have an advantage. However, this phase won't last long.
Unless there is a principal block to further enhance AI programming, not just simple functions but whole apps can be created with a prompt. However, this is not where it is going to stop. Soon, there will be no need for apps in the traditional sense. End users will use AI to manipulate and visualize data and operating systems will integrate the AI services needed for this. "Apps" can be created on the fly and are constantly adjusted to the users' needs.
Creating apps will not remain a profitable business. If there is an app X someone likes, they can prompt their AI to create an app with the same features, but perhaps with these or those small changes, and the AI will create it for them, including thorough tests and quality assurance.
Right now, in the transitional phase, senior engineers might feel they are safe because someone has to monitor and check the AI output. But there is no reason why humans would be needed for that step in the long run. It's cheaper to have 3 AIs quality test and improve the outputs of one generating AI. I'm sure many companies are already experimenting with this, and at some point the output of such iterative design procedures will have far less bugs than any code produced by humans. Only safety critical essential features such as operating systems and banking will continue to be supervised by humans, though perhaps mostly for legal reasons.
Although I hope it's not but to me the end of software development seems a logical long-term consequence of current AI development. Perhaps I've missed something, I'd be interested in hearing from people who disagree.
It's ironic because in my great wisdom I chose to quit my day job in academia recently to fulfill my lifelong dream of bootstrapping a software company. I'll see if I can find a niche, maybe some people appreciate hand-crafted software in the future for its quirks and originality...
The key isn't how much you can speed up the scalable/parallelizable portions, it's how limited you are by the non-scalable/parallelizable aspects.
Even when you do write code, you often only care about specific aspects—you just want to automate the rest.
This is hard to reconcile with modern business models. If you tell someone that a software engineer can also design, they’ll just fire the designer and pile more work on the engineer. But it doesn’t change the underlying truth: a single engineer who can touch many parts of the software with low cognitive friction is simply a better kind of engineer.
AI's so far haven't been able to beat that.
However I have found AI's to be great when working with unfamiliar tools. Where the effort involve in reading the docs etc. far outweigh the benefits. In my case using AI's to generate JasperReports .jrxml files made me more productive.
Ingesting legacy code, understanding it, looking at potential ways to rework it, and then putting in place the axioms to first work with it yourself, and then for others to join in has been able to get down from months to weeks and days.
Developing green field from scratch, statically typed languages seem to work a bit better than not.
Putting enough information around the requirements and how to structure undertake them is critical or it can turn into cowboy coding pretty easily, or default AI is leaning towards the average of it's corpus, not the best. That's where the developer comes in.
Not my experience.
You can instruct Claude Code to respect standards and practices of your codebase.
In fact I noticed that Claude Code has forced me to make few genuinely important things like documenting more, writing more E2E tests and tracking architectural and style changes.
Not only I am forcing myself to a consistent (and well thought styling), but I also need it later to feed it to the AI itself.
Seriously, I don't want to offend no one, but if you believe that AI doesn't make you more productive you've got skill issues in adopting and using new tools at what they are good at.
I find that getting from zero to 80-90% functionality on just about anything software these days is exceedingly easy. So, I wonder if AI just rides that wave. Software development is maturing now such that making software with or without AI feels 10-100x faster. I suspect it is partially due to the profound leap that has been made with collaborative tools, compilers, languages, and open source methodology, etc..
Any tool can be shown to increase performance in closed conditions and within specific environments, but when you try to generalize things do not behave consistently.
Regardless, I would always argue that trying new tech / tools / workflows is always better than being stiff in your ways, regardless of the productivity results. I do like holding up on new things until things mature down a bit before trying though.
It makes everyone “produce more code” but your worst dev producing 10X the code is not 10X more productive.
There’s also a bit of a dunning Kruger effect where the most careless people are the most likely to YOLO thousands of lines of vibecode into prod. While a more meticulous engineer might take a lot more time to read the changes, figure out where the AI is wrong, and remove unnecessary code. But the second engineer would be seen as much much less productive than the first in this case
Totally agree, IMO there's a lot of potential for these tools to help with code understanding and not just generation. Shameless plug for a code understanding tool we've been working on that helps with this: https://github.com/sourcebot-dev/sourcebot
Perfectly put. I've been using a lot of AI for shell scripting. Granted I should probably have better knowledge of shell but frankly I think it's a terrible language and only use it because it enjoys wide system support and is important for pipelining. I prefer TS (and will try to write scripts and such in it if I can) and for that I don't use AI almost at all.
Much of production software engineering is writing boiler plate, building out test matrices and harnesses, scaffolding structure. And often, it’s for very similarly shaped problems at their core regardless of the company, organization, or product.
AI lets me get a lot of that out of the way and focus on more interesting work.
One might argue that’s a failure of tools or even my own technique. That might be true, but it doesn’t change the fact that I’m less bored than I used to be.
> Oh, and this exact argument works in reverse. If you feel good doing AI coding, just do it. If you feel so excited that you code more than ever before, that's awesome. I want everyone to feel that way, regardless of how they get there.
I enjoyed the article, fwiw. Twitter was insufferable before Elon bought it, but the AI bro scene is just...wow. An entire scene who only communicate in histrionics.
Actually writing software was only like 15-20% of my time though so the efficiency wins from having an LLM write the code is somewhat limited. It’s still another tool that makes me more productive but I’ve not figured out a way to really multiplicatively increase my productivity.
Exactly. I spend less than 20% of my time writing code. If LLMs 1,000,000,000-x'd my code writing, it would make me 1.25x as efficient overall, not 10x as efficient. It's all influencer hype nonsense, just like pair programming and microservices and no-code companies and blockchain.
This assumes the acceleration happens on all tasks. Amdahl's law states that the overall acceleration is constrained by the portion of the accelerated work. Probably it's just unclear if the "engineer" or "productivity" means the programming part or the overall process.
Ironically, when I listen to vinyl instead of streaming, I listen to less music.
If I'm in the zone, I will often go minutes between flipping the record or choosing another one; even though my record player is right next to me.
That's when/if you're giving it your full attention. I used to do that when I was younger, but much less frequently now.
That being said, there's something hypnotic about watching a record spin, and seeing the needle in the groove. I don't do it now that I'm older, but my kids used to specifically ask me to play a record just so they could see it spin.
- vibe coding is fun, but not production-ready software engineering
- LLMs like CC today moderately boost your performance. A lot of attention is still needed.
- some TDD style is needed for the AI tool to converge
- based on the growth of the last few months, it is quite likely that these tools will increase IC productivity substantially
- fully autonomous agentic coding will take more time as the error rate needs to decline significantly
The problem is...
1. there is an enormous investment in $$$ produces a too big to fail scenario where extravagant claims will be made regardless
2. leadership has made big promises around productivity and velocity for eng
the end result of this is going to be a lot of squinting at the problem, ignoring reality and declaring victory. These AI tools are very useful in automating chore and grunt tasks.
The amount of product ideation, story point negotiation, bugfixing, code
review, waiting for deployments, testing, and QA in that go into what was
traditionally 3 months of work is now getting done in 7 work days? For that
to happen each and every one of these bottlenecks has to also seen have 10x
productivity gains.
Context: EACL is an embedded ReBAC authorization library based on SpiceDB-compatible*, built in Clojure and backed by Datomic.
In addition to the article, I'd like to add that most DEV jobs I have been in had me coding only 50% of my time at most. The rest of the time was spent in meetings, gathering requirements and investigating Prod issues.
Or the data showing something else... possibly, a company starts telling engineers to use AI, then RIFs a huge portion, and expects the remaining engineers to pick up the slack. They now claim "we're more efficient!" when they've just asked their employees to work more weekends.
It is not making us 10x productive. It is making it 10x easier.
If your organization is routinely spending 3 months on a code review, it sounds like there's probably a 10 to 100x improvement you can extract from fixing your process before you even start using AI.
But if your system records internal state in english and generates code while handling requests, complex systems can become much simpler. You can build things that were impossible before
Any codebase that's difficult for me to read would be way too large to use an LLM on.
Interesting observation. I am inclined to agree with this myself. I'm more of a 10^0 kind of developer though.
When I use Claude Code on my personal projects, it's like it can read my mind. As if my project is coding itself. It's very succinct and consistent. I just write my prompt and then I'm just tapping the enter key; yes, yes, yes, yes.
I also used Claude Code on someone else's code and it was not the same experience. It kept trying to implement dirty hacks to fix stuff but couldn't get very far with that approach. I had to keep reminding it "Please address the root cause" or "No hacks" or "Please take a step back and think harder about this problem." There was a lot of back-and-forth where I had to ask it to undo stuff and I had to step in and manually make certain changes.
I think part of the issue is that LLMs are better at adding complexity than at removing it. When I was working on the bad codebase, the times I had to manually intervene, the solution usually involved deleting some code or CSS. Sometimes the solution was really simple and just a matter of deleting a couple of lines of CSS but it couldn't figure it out no matter how I wrote the prompt or even if I hinted at the solution; it kept trying to solve problems by adding more code on top.
That means that good developers are more productive, and bad developers create more work for everyone else at an very rapid pace.
Because AI gets you to the next constraint even faster :)
LLMs make writing code quick, that's it. There's nothing more to this. LLMs aren't solutioning nor are they smart. If you know what you want to build, you can build quick. Not good, quick.
That said, if managers don't care about code quality (because customer's don't care either) then who am I to judge them. I don't care.
I'm on the edge of just blacklisting the word AI from my feed.
The full year is just the more of the above.
When you're not sure if what someone says makes sense, trust common sense, your own experience, and your thinking.
You have to change the organization.
- no peer code review, u review the AI output and that’s enough
- devs need authority to change code anywhere in the company. No more team A owns service A and team B owns service B
- every dev and ops person needs to be colocated, no more waiting for timezones
- PMs and engineers are the same role now
Will it work for every company? No , if you are building a pacemaker , don’t use AI . Will things break? Yes sometimes but you can roll back.
Will things be somewhat chaotic? Yes somewhat but what did you think going 10x would feel like?
The part I disagree about: I've never worked at a company that has a 3 month cycle from code-written to code-review-complete. That sounds insane and dysfunctional. AI won't fix an organization like that
The better argument is that Software Engineers spend a lot of time doing things that aren't writing code and arent being accelerated by any AI code assistant
If I can write blue sky / green field, code. Brand new code in a new repo, no rules just write code, I can write tons of code. What bogs me down are things like tests. It can take more time to write tests than the code itself in my actual work project. Of course I know the tests are important and maybe the LLM can help here. I'm just saying that they slow me down. Waiting for code reviews slows me down. Again, they're super useful but coming from a place where the first 20-25 years of my career I didn't have them they are a drag on my performance. Another is just the size of the project I'm on. > 500 programmers on my current large project. Assume it's an OS. It's just hard to make progress on such a large project compared to a small one. And yet another which is part of the first, other people's code. If I write the whole thing or most of it, then I know exactly what to change. I've written features in code I know in days that someone who was not familiar with the code I believe would have taken months. But, me editing someone else's code without the entire state of the code base in my head is 10x slower.
That's a long way of saying, many 10xers might just be in the right circumstance to provide 10x. You're then compared against them but you're not in the same circumstance so you get different results.
I used to not really believe people like that existed but it turned out they're just rare enough that I hadn't worked with any yet. You could definitely go a whole career without ever working with any 10x engineers.
And also it's not like they're actually necessary for a project to succeed. They're very good but it's extremely unlikely that a project will succeed on the back of one or two very good engineers. The project I worked with them on failed for reasons nothing to do with us.
My use case is not for a 10x engineer but instead for *cognitive load sharing*. I use AI in a "non-linear" fashion. Do you? Here is what that means:
1. Brainstorm an idea and write down detailed enough plan. Like tell me how I might implement something or here is what I am thinking can you critique and compare it with other approaches. Then I quickly meet with 2 more devs and make a design decision for which one to use.
2. Start manual coding and let AI "fill the gaps": Write these test for my code or follow this already existing API and create the routes from this new spec. This is non-linear because I would complete 50-75% of the feature and let the rest be completed by AI.
3. I am tired and about to end my shift and there is this last bug, I go read the docs but I also ask AI to read my screen and come up with some hypothesis to come up with. I decide which hypothesis are most promising after some reading and then ask the AI to just test that(not fix it on auto mode).
4. Voice mode: I have a shortcut that triggers claude code and uses it like a quick "lookup/search" in my code base. This avoids context switching.
For newer languages, packages, and hardware-specific code, I have yet to use a single frontier model that has not slowed me down by 50%. It is clear to me that LLMs are regurgitating machines, and no amount of thinking will save the fact that the transformer architecture (all ML really) poorly extrapolates beyond what is in the training canon.
However, on zero-to-one projects that are unconstrained by my mag-seven employer, I am absolutely 10x faster. I can churn through boilerplate code, have faster iterations across system design, and generally move extremely fast. I don't use agentic coding tools as I have had bad experiences in how the complexity scales, but it is clear to me that startups will be able to move at lightning pace relative to the large tech behemoths.
Where CC has excelled:
- New well-defined feature built upon existing conventions (10x+ boost)
- Performing similar mid-level changes across multiple files (10x+ boost)
- Quickly performing large refactors or architecture changes (10x+ boost)
- Performing analysis of existing codebases to help build my personal understanding (x10+ boost)
- Correctly configuring UI layouts (makes sense: this is still pattern-matching, but the required patterns can get more complex than a lot of humans can quickly intuit)
Where CC has floundered or wasted time:
- Anything involving temporal glitches in UI or logic. The feedback loop just can't be accomplished yet with normal tooling.
- Fixing state issues in general. Again, the feedback loop is too immature for CC to even understand what to fix unless your tooling or descriptive ability is stellar.
- Solving classes of smallish problems that require a lot of trial-and-error, aren't covered by automated tests, or require a steady flow of subjective feedback. Sometimes it's just not worth setting up the context for CC to succeed.
- Adhering to unusual or poorly-documented coding/architecture conventions. It's going to fight you the whole way, because it's been trained on conventional approaches.
Productivity hacks:
- These agents are automated, meaning you can literally have work being performed in parallel. Actual multitasking. This is actually more mentally exhausting, but I've seen my perceived productivity gains increase due to having 2+ projects going at once. CC may not beat a single engineer for many tasks, but it can literally do multiple things at once. I think this is where the real potential comes into play. Monitoring multiple projects and maintaining your own human mental context for each? That's a real challenge. - Invest in good context documents as early as possible, and don't hesitate to ask CC to insert new info and insights in its documents as you go. This is how you can help CC "learn" from its mistakes: document the right way and the wrong way when a mistake occurs.
Background: I'm a 16yoe senior fullstack engineer at a startup, working with React/Remix, native iOS (UIKit), native Android (Jetpack Compose), backends in TypeScript/Node, and lots of GraphQL and Postgres. I've also had success using Claude Code to generate Elixir code for my personal projects.
But is a 10x going to 100x?
> It tends to struggle with languages like Terraform
The language is called HCL (HashiCorp Configuration Language).
So two groups are talking past one another. Someone has a completely new idea, starts with nothing and vibe codes a barely working MVP. They claim they were able to go from 0 to MVP ~10x faster than if they had written the code themselves.
Then some seasoned programmer hears that claim, scoffs and takes the agent into a legacy code base. They run `/init` and make 0 changes to the auto-generated CLAUDE.md. They add no additional context files or rules about the project. They ask completely unstructured questions and prompt the first thing that comes into their minds. After 1 or 2 days of getting terrible results they don't change their usage or try to find a better way, they instead write a long blog post claiming AI hype is unfounded.
What they ignore is that even the maximalists are stating: 30%-50% improvement on legacy code bases. And that is if you use the tool well.
This author gets terrible results and then says: "Dark warnings that if I didn't start using AI now I'd be hopelessly behind proved unfounded. Using AI to code is not hard to learn." How sure is the author that they actually learned to use it? "A competent engineer will figure this stuff out in less than a week of moderate AI usage." One of the most interesting things about learning are those things that are easy to learn and hard to master. You can teach a child chess, it is easy to learn but it is hard to master.
Maybe LLMs make you 10x faster at using boilerplate-heavy things like Shadcn/ui or Tanstack.
...which is still only about half as fast as using a sane ecosystem.
IMO this is why there's so many diverging opinions about the productivity of AI tools.
but every company is going to enshittify everything they can to pidgeonhole ai use to justify the grifters costs
i look forward to years out when these companies trying to save money at any cost have to pay senior developers to rip all this garbage out
Not really? That's defining productivity as latency, but it's at least as valid to define productivity as throughput.
And then all the examples that are just about time spent waiting become irrelevant. When blocked waiting on something external, you just work on other things.
My point around waiting for things like code review is that it creates a natural time floor, the context switching takes time and slows down other work. If you have 10x as much stuff to get reviewed, all the time loss to context switching is multiplied by 10x.
There is no secret herbal medicine that prevents all disease sitting out in the
open if you just follow the right Facebook groups. There is no AI coding
revolution available if you just start vibing. You are not missing anything.
Trust yourself. You are enough.
Oh, and don't scroll LinkedIn. Or Twitter. Ever.
This is all you have to takeaway from this article. Social media is a cesspool of engagement farmers dropping BS takes to get you to engage out of FOMO or anger. Every time I'm on there, I am instantly reminded why I quit going there. It's not genuine and it's designed to capture your attention away from more important things.
I've been using LLMs on my own for the past few years and we just recently started our own first party model that we can now use for work. I'm starting to get into agentic actions where I can integrate with confluence, github, jira, etc. It's a learning curve for sure but I can see where it will lead to some productivity gains but the road blocks are still real, especially when working with other teams. Whether you're waiting for feedback or a ticket to be worked on, the LLM might speed run you to a solution but you better be ready with the next thing and the next thing while you're waiting.