That seems obvious, but a consequence of that is that people who are sceptical of ai (like me) only use it when they've exhausted other resources (like google). You ask very specific questions where not a lot of documentation is available and inevetably even o3 ends up being pretty useless.
Conversely there's people who love ai and use it for everything, and since the majority of the stuff they ask about is fairly simple and well documented (eg "Write me some typescript"), they rarely have a negative experience.
I suppose it would be simpler to compare productivity for people working on standard, "normalized" tasks, but often every other task a programmer is assigned is something different to the previous one.
Like can we determine the productivity of doctors, lawyers, journalists, or pastry chefs?
What job out there is so simple that we can meaningfully measure all the positive and negative effects of the worker as well as account for different conditions between workers.
I could probably get behind the idea that you could measure productivity for professional poker players (given a long enough evaluation period). Hard to think of much else.
The British government (probably not any worse than anyone else, just what I am most familiar with) does measure the productivity of the NHS: https://www.england.nhs.uk/long-read/nhs-productivity/ (including doctors, obviously).
They also try to measure the performance of teachers and schools and introduced performance league tables and special exams (SATS - exams sat at various ages school children in the state system, nothing like the American exams with the same name) to do this more pervasively. They made it better by creating multi-academy trusts which adds a layer of management running multi-schools so even more people want even more metrics.
The same for police, and pretty much everything else.
And to be fair, some crud work is repetitive enough so it should be possible to get a fair measure of at least the difference in speed between developers.
But that building simple crud services with rest interfaces takes as much time as it does is a failure of the tools we use.
Yes, yes we can.
Programmers really need to stop this cope about us being such special snowflakes that we can't be assessed and that our maangers just need to take that we're worth keeping around on good faith.
Like I get that in SWE (like all other fields), managers have to make judgement calls and try to evaluate which reports contribute the most, but the GP post seemed surprised that this wasn't a solved problem by now, which just seems incomprehensible to me.
Of course we can. But can we do it in a meaningful way, such that the metric itself doesn't become a subject to optimization?
"When a measure becomes a target, it ceases to be a good measure"
Could you make an effort to explain how, or at the very least link to some reasoning? Otherwise your comment is basically the equivalent of “nuh-uh”, which doesn’t meaningfully contribute to the discussion.
> Programmers really need to stop this cope about us being such special snowflakes
Which is not at all what is happening in your parent comment. On the contrary, they’re putting developers on even footing with other professions.
Management is forced to rely on various metrics which are gamed or inaccurate.
ON the other hand it makes no sense from some points of view. For example, if you get a pay rise that does not mean you are more productive.
I made great money running my own businesses, but the vast majority of the programming was by people I hired. I’m a decent talent, but that gave me the ability to hire better ones than me.
Changing jobs typically brings a higher salary than your previous job. Are you saying that I'm significantly more productive right after changing jobs than right before?
I recently moved from being employed by a company to do software development, to running my own software development company and doing consulting work for others. I can now put in significantly fewer hours, doing the same kind of work (sometimes even on the same projects that I worked on before), and make more money. Am I now significantly more productive? I don't feel more productive, I just learned to charge more for my time.
IMO, your suggestion falls on its own ridiculousness.
Some of the most productive devs don't get paid by the big corps who make use of their open source projects, hence the constant urging of corps and people to sponsor projects they make money via.
What about countries? In my Poland $25k would be an amazing salary for a senior while in USA fresh grads can earn $80k. Are they more productive?
... at the same time, given same seniority, job and location - I'd be willing to say it wouldn't be a bad heuristic.
How I measure performance is how many features I can implement in a given period of time.
It's nice that people have done studies and have opinions, but for me, it's 10x to 20x better.
Someone already operating at the very limit of their abilities doing stuff that is for them high complexity, high cognitive load, detail intense, and tactically non-obvious? Even a machine that just handed you the perfect code can't 20x your real output, even if it gave you the source file at 20x your native sophistication you wouldn't be able to build and deploy it, let alone make changes to it.
But even if it's the last 5-20% after you're already operating at your very limit and trying to hit your limit every single day is massive, it makes a bunch of stuff on the bubble go from "not realistic" to "we did that".
A key skill is to sense when the AI is starting to guess for solutions (no different to human devs) and then either lean into another AI or reset context and start over.
I'm finding the code quality increase greatly with the addition of the text 'and please follow best practices because will be pen tested on this!' and wow.. it takes it much more seriously.
Most of the coding needed to give people CRUD interfaces to resources is all about copy / pasting and integrating tools together.
Sort of like the old days when we were patching all those copy/paste's from StackOverflow.
Too little of full stack application writing is truly unique.
It would be interesting to set up a MCP style interface, but even me copy/pasting between windows was constructive.
The time this worked best was when I was building a security model for an API that had to be flexible and follow best practices. It was interesting seeing ChatGPT compare and contrast against major API vendors, and Claude Code asking the detailed implementation questions.
The final output was a pragmatic middle-ground between simplistic and way too complex.
Also I disagree. For web dev atleast, most people are just rewriting the same stuff in a different order. Even though the entire project might be complex from a high level perspective, when you dive into the components or even just a single route it ain't "high complexity" at all and since I believe most jobs are in web / app dev which just recycles the same code over and over again that's why there's a lot of people claiming huge boosts to productivity.
How much of the code you write is actually like this? I work in the domain of data modeling, for me once the math is worked out majority of the code is "trivial". The kind of code you are talking about is maybe 20% of my time. Honestly, also the most enjoyable 20%. I will be very happy if that is all I would work on while rest of it done by AI.
When you zoom in, even this kind of work isn't uniform - a lot of it is still shaving yaks, boring chores, and tasks that are hard dependencies for the work that is truly cognitively demanding, but themselves are easy(ish) annoyances. It's those subtasks - and the extra burden of mentally keeping track of them - that sets the limit of what even the most skilled, productive engineer can do. Offloading some of that to AI lets one free some mental capacity for work that actually benefits from that.
> Even a machine that just handed you the perfect code can't 20x your real output, even if it gave you the source file at 20x your native sophistication you wouldn't be able to build and deploy it, let alone make changes to it.
Not true if you use it right.
You're probably following the "grug developer" philosophy, as it's popular these days (as well as "but think of the juniors!", which is the perceived ideal in the current zeitgeist). By design, this turns coding into boring, low-cognitive-load work. Reviewing such code is, thus, easier (and less demoralizing) than writing it.
20x is probably a bit much across the board, but for the technical part, I can believe it - there's too much unavoidable but trivial bullshit involved in software these days (build scripts, Dockerfies, IaaS). Preventing deep context switching on those is a big time saver.
Yeah, I'm not a dev but I can see why this is true, because it's also the argument I use in my job as an academic. Some people say "but your work is intellectually complex, how can you trust LLMs to do research, etc.?", which of course, I don't. But 80% of the job is not actually incrementally complex, it's routine stuff. These days I'm writing the final report of a project and half of the text is being generated by Gemini, when I write the data management plan (which is even more useless) probably 90% will be generated by Gemini. This frees a lot of time that I can devote to the actual research. And the same when I use it to polish a grant proposal, generate me some code for a chart in a paper, reformat a LaTeX table, brainstorm some initial ideas, come up with an exercise for an exam, etc.
Tons of dev work is not exciting, I have already launched a solo dev startup that was acquired, and the 'fun' part of that coding was minimal. Too much was the scaffolding, CRUD endpoints, web forms, build scripts, endpoint documentation, and the true innovative stuff was such a small part of the whole project. Of the 14 months of work, only 1 month was truly innovative.
When I said that after you've done all the other stuff, I was including cutting all the ridiculous bullshit that's been foisted on an entire generation of hackers to buy yachts for Bezos and shit.
I build clean libraries from source with correct `pkg-info` and then anything will build against it. I have well-maintained Debian and NixOS configurations that run on non-virtualized hardware. I use an `emacs` configuration that is built-to-specifications, and best-in-class open builds for other important editors.
I don't even know why someone would want a model spewing more of that garbage onto the road in front of them until you're running a tight, optimized stack to begin with, then the model emulates to some degree the things it sees, and they're also good.
Like: Why isn’t this working? Here Claude read this like 90 page PDF and tell me where I went wrong interfacing with this SDK.
Ohh I accidentally passed async_context_background_threading_safe instead of async_context_thread_safe_poll and it’s so now it’s panicking. Wow that would have taken me forever.
Stage magicians say that the magic is done in the audiences memory after the trick is done. It's the effect of the activity.
AI coding tools makes developers happier and able to spend more brain power on actually difficult things. But overall perhaps the amount of work isn't in orders of magnitudes it just feels like it.
Waze the navigation app routes you in non standard routes so that you are not stuck in traffic, so it feels fast that you are making progress. But the time taken may be longer and the distance travelled may be further!
Being in stuck traffic and not moving even for a little bit makes you feel that time has stopped, it's boring and frustrating. Now developers need never be stuck. Their roads will be clear, but they may take longer routes.
We get little boosts of dopamine using AI tools to do stuff. Perhaps we used these signals as indicators of productivity "Ahh that days work felt good, I did a lot"
Can help but note that in 99% cases this "difficult things" trope makes little sense. In most jobs, the freed time is either spent on other stupid tasks or is lost due to org inefficiencies. :)
You're not "stuck in traffic", you are the traffic. If the app distributes users around and this makes it so they don't end up in traffic jams, it's effectively preventing traffic jams from forming
I liked your washing machine vs. sink example that I see you just edited out. The machine may do it slower and less efficiently than you'd do in the sink, but the machine runs in parallel, freeing you to do something else. So is with good use of LLMs.
For Waze, even if you are traffic and others go around you, you still may get there quicker and your car use less energy than taking the suggested route that feels faster. Others may feel happier and feel like they were faster though. Indeed they were faster but might have taken a longer journey.
Also, generally most people don't use the app around here to effect significant road use changes. But if they did im not sure (but I'm having fun trying to think) what metaphor we can apply to the current topic :)
So I work 8 hours a day (to get money to eat) and code another 4 hours at home at night.
Weekends are both 10 hour days, and then rinse / repeat.
Unfortunately some projects are just hard to do and until now, they were too hard to attempt to solve solo. But with AI assistance, I am literally moving mountains.
The project may still be a failure but at least it will fail faster, no different to the pre-AI days.
It means you can replace a whole team of developers alone.
I can believe that some tasks are speed up by 10x or even 20x, but I find very hard to believe it's the average of your productivity (maintaining good code quality)
So me finishing a carded up block of work that is expected to take 2 weeks (80 hours) and I get it done in 1 day (8 hours) then that would be a 10x boost.
There are always tar pits of time where you are no better off with AI, but sometimes it's 20x.
I've setup development teams in the past, and have have been coding since the late 70's, so I am sort of aware of my capabilities.
It super depends on the type of work you're doing.
I mean, it's literally unbelievable.
So were the people taking the study. Which is why we do these, to understand where our understanding of ourselves is lacking.
Maybe you are special and do get extra gains. Or maybe you are as wrong about yourself as everyone else and are overestimating the gains you think you have.
When a measure becomes a target, it ceases to be a good measure.
You can build a new product company with 20 people. Probably in the same domain as you are in right now.
I ended up asking it how it wanted to work and would an 'AdminKit Template' work to get things moving.
It recommended AdminKit and that was a good move.
For me, custom UI's aren't a big part of the solution, I just need web pages to manage CRUD endpoints to manage the product.
AdminKit has been a good fit so far, but it was a fresh start, no migration.
Recently, there was story about developer who was able to crush interview and got parallel full-time jobs in several start-ups. Initially he was able to deliver but then not so much.
Somehow your case is reminding this to me, where AI is this overemployed developer.
https://repo.autonoma.ca/notanexus.git
I don't know the PDF.js library. Writing both the client- and server-side for a PDF annotation editor would have taken 60 hours, maybe more. Instead, a combination Copilot, DeepSeek, Claude, and Gemini yielded a working prototype in under 6 hours:
https://repo.autonoma.ca/notanexus.git/tree/HEAD/src/js
I wrote maybe 3 lines of JavaScript, the rest was all prompted.
I'm leaning into the future growth of AI capabilities to help me here, otherwise I'll have to do it myself.
That is a tomorrow problem, too much project structure/functionality to get right first.
With most projects where innovation is a key requirement, the goal isn't to write textbook quality code, it's to prove your ideas work and quickly evolve the project.
Once you have an idea of how it's going to work, you can then choose to start over from scratch or continue on and clean up all the bits you skipped over.
Right now I'm in the innovation cycle, and having AI able to pick up whole API path strategies and pivot them, is incredibly amazing.
How many times have you used large API's and seen clear hands of different developers and URI strategies, with an AI, you just pivot.
Code quality and pen tests are critical, but they can come later.
I am wondering, what sort of tasks are you seeing these x20 boost?
Claude code has made bootstrapping a new project, searching for API docs, troubleshooting, summarizing code, finding a GitHub project, building unit tests, refactoring, etc easily 20x faster.
It’s the context switching that is EXTREMELY expensive for a person, but costless for the LLM. I can focus on strategy (planning features) instead of being bogged down in lots of tactics (code warnings, syntax errors).
Claude Code is amazing, but the 20x gains aren’t evenly distributed. There are some projects that are too specialized (obscure languages, repos larger than the LLM’s context window, concepts that aren’t directly applicable to any codebase in their training corpus, etc). But for those of us using common languages and commodity projects, it’s a massive force multiplier.
I built my second iOS app (Swift) in about 3 days x 8 hours of vibe coding. A vocab practice app with adjustable learning profile, 3 different testing mechanisms, gamification (awards, badges), iOS notifications, text to speech, etc. My first iOS app was smaller, mostly a fork of another app, and took me 4 weeks of long days. 20x speed up with Claude Code is realistic.
And it saves even more time when researching + planning which features to add.
I scoped out a body of work and even with the AI assisting on building cards and feature documentation, it came to about 2 to 4 weeks to implement.
It was done in 2 days.
The key I've found with working as fast as possible is to have planning sessions with Claude Code and make it challenge you and ask tons of questions. Then get it to break the work into 'cards' (think Jira, but they are just .md files in your repo) and then maintain a todo.md and done.md file pair that sorts and organizes work flow.
Then start a new context, tell it to review todo.md and pick up next task, and burn through it, when done, commit and update todo.md and done.md, /compact and you're off on the next.
It's more than AI hinting at what to do, it's a whole new way of working with rigor and structure around it. Then you just focus fire on the next card, and the next, and if you ever think up new features, then card it up and put it in the work queue.
If one of these things isn’t true, you’re either a fool or those productivity increases aren’t real.
A simple example: if someone patents a machine that makes canned tuna 10 times faster than how they're currently being made, would tuna factories make 10 times more money? The answer is obviously no. Actually, they'd make the same money as before, or even less than that. Only the one who makes such a machine (and the consumers of tuna cans) would be benefited.
10x to 20x is in relation to time, so something that would have taken 2 weeks (80 hours) would be done in 8 hours to be 10x.
There should be a FOSS project explosion if those numbers were true by now. Commercial products too.
Example: using LeafletJS — not hard, but I didn't want to have to search all over to figure out how to use it.
Example: other web page development requiring dropping image files, complicated scrolling, split-views, etc.
In short, there are projects I have put off in the past but eagerly begin now that LLMs are there to guide me. It's difficult to compare times and productivity in cases like that.
Leaflet doc is single page document with examples you can copy-paste. There is page navogation at the top. Also ctrl/cmd+f and keyword seems quicker than writing the prompt.
When I'm working with platforms/languages/frameworks I am already deeply familiar with I don't think they save me much time at all. When I've tried to use them in this context they seem to save me a bunch of time in some situations, but also cost me a bunch of time in others resulting in basically a wash as far as time saved goes.
And for me a wash isn't worth the long-term cost of losing touch with the code by not being the one to have crafted it.
But when it comes to environments I'm not intimately familiar with they can provide a very easy on-ramp that is a much more pleasant experience than trying to figure things out through often iffy technical documentation or code samples.
> To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.
So it's a small sample size of 16 developers. And it sounds like different tasks were (randomly) assigned to the no-AI and with-AI groups - so the control group doesn't have the same tasks as the experimental group. I think this could lead to some pretty noisy data.
Interestingly - small sample size isn't in the list of objections that the auther includes under "Addressing Every Objection You Thought Of, And Some You Didn’t".
I do think it's an interesting study. But would want to see if the results could be reproduced before reading into it too much.
I think that's where you get 10-20x. When you're working on niche stuff it's either not gonna work or work poorly.
For example right now I need to figure out why an ffmpeg filter doesn't do X thing smoothly, even though the C code is tiny for the filter and it's self contained.. Gemini refuses to add comments to the code. It just apologizes for not being able to add comments to 150 lines of code lol.
However for building an ffmpeg pipeline in python I was dumbfounded how fast I was prototyping stuff and building fairly complex filter chains which if I had to do by hand just by reading the docs it would've taken me a whole lot more time, effort and frustration but was a joy to figure out with Gemini.
So going back to the study, IMO it's flawed because by definition working on new features for open source projects wouldn't be the bread and butter of LLMs however most people aren't working on stuff like this, they're rewriting the same code that 10000 other people have written but with their own tiny little twist or whatever.
My analogy to this is seeing people spend time trying to figure out how to change colors, draw shapes in powerpoint, rather than focus on the content and presentation. So here, we have developers now focusing their efforts on correcting the AI output, rather than doing the research and improving their ability to deliver code in the future.
Hmm...
When I’m in the “zone” I wouldn’t go near an LLM, but when I’ve fallen out of the “zone” they can be useful tools in getting me back into it, or just finishing that one extra thing before signing off for the day
I think the right answer to “does LLM use help or hinder developer productivity” is “it depends on how you use them”
Wouldn't it be the opposite? I'd expect the code would be 47% longer because it's worse and heavier in tech debt (e.g. code repeated in multiple places instead of being factored out into a function).
AI isn't very good at being concise, in my experience. To the point of producing worse code. Which is a strange change from humans who might just have a habit of being too concise, but not by the same degree.
One: The work to get code to a reviewable point is significant. Skipping it, either with or without AI, is just going to elongate the review process.
Two: The whole point of using AI is to outsource the thought to a machine that can think much faster than you can in order to ship faster. If the normal dev process was 6 hours to write and 2 hours to review, and the AI dev process was 1 hour to write and 8 hours to review, the author will say "hey why is review taking so long; this defeats the purpose". You can't say "code review fixes these problems" and then bristle at the necessary extra review.
In my experience, review was inadequate back before we had AI spewing forth code of dubious quality. There's no reason to think it's any better now.
An actually-useful AI would be one that would make reviews better, do them itself, or at least help me get through reviews faster.
edit: should have mentioned the low-level stuff I work on is mature code and a lot of times novel.
Just last week I had to review some monstrosity of a FE ticket written by one of our backenders, with the comment of "it's 90% there, should be good to takeover". I had to throw out pretty much everything and rewrite it from scratch. My solution was like 150 lines modified, whereas the monstrous output of the AI was non-functional, ugly, a performance nightmare and around 800 lines, with extremely unhelpful and generic commit messages to the tune of "Made things great!!1!1!!".
I can't even really blame them, the C-level craze and zeal for the AI shit is such that if you're not doing crap like this you get scrutinized and PIP'd.
At least frontenders usually have some humility and will tell you they have no clue if it's a good solution or not, while BEnders are always for some reason extremely dismissive of FE work (as can be seen in this very thread). It's truly baffling to me
I ended shoehorned into backend dev in Ruby/Py/Java and don't find it improves my day to day a lot.
Specifically in C, it can bang out complicated but mostly common data-structures without fault where I would surely do one-off errors. I guess since I do C for hobby I tend to solve more interesting and complicated problems like generating a whole array of dynamic C-dispatchers from a UI-library spec in JSON that allows parsing and rendering a UI specified in YAML. Gemini pro even spat out a YAML-dialect parser after a few attempts/fixes.
Maybe it's a function of familiarity and problems you end using the AI for.
For frontend though? The stuff I really don't specialize in (despite some of my first html beginning on FrontPage 1997 back in 1997), it's a lifesaver. Just gotta be careful with prompts since so many front end frameworks are basically backend code at this point.
Things like "apply this known algorithm to that project-specific data structure" work really well and save plenty of time. Things that require a gut feeling for how things are organized in memory don't work unless you are willing to babysit the model.
In both of these cases, I found that just the smart auto-complete is a massive time-saver. In fact, it's more valuable to me than the interactive or agentic features.
Here's a snippet of some code that's in one of my recent buffers:
// The instruction should be skipped if all of its named
// outputs have been coalesced away.
if ! self.should_keep_instr(instr) {
return;
}
// Non-dropped should have a choice.
let instr_choice =
choices.maybe_instr_choice(instr_ref)
.expect("No choice for instruction");
self.pick_map.set_instr_choice(
instr_ref,
instr_choice.clone(),
);
// Incref all named def inputs to the PIR choice.
instr_choice.visit_input_defs(|input_def| {
self.def_incref(input_def);
});
// Decref all named def inputs to the SIR instr.
instr.visit_inputs(
|input_def| self.def_decref(input_def, sir_graph)
);
The actual code _I_ wrote were the comments. The savings in not having to type out the syntax is pretty big. About 80% of the time in manual coding would have been that. Little typos, little adjustments to get the formatting right.The other nice benefit is that I don't have to trust the LLM. I can evaluate each snippet right there and typically the machine does a good job of picking out syntactic style and semantics from the rest of the codebase and file and applying it to the completion.
The snippet, if it's not obvious, is from a bit of compiler backend code I'm working on. I would never have even _attempted_ to write a compiler backend in my spare time without this assistance.
For experienced devs, autocomplete is good enough for massive efficiency gains in dev speed.
I still haven't warmed to the agentic interfaces because I inherently don't trust the LLMs to produce correct code reliably, so I always end up reviewing it, and reviewing greenfield code is often more work than just writing it (esp now that autocomplete is so much more useful at making that writing faster).
Recently, my company has been investigating AI tools for coding. I know this sounds very late to the game, but we're a DoD consultancy and one not traditional associated with software development. So, for most of the people in the company, they are very impressed with the AI's output.
I, on the other hand, am a fairly recent addition to the company. I was specifically hired to be a "wildcard" in their usual operations. Which is too say, maybe 10 of us in a company of 3000 know what we're doing regarding software (but that's being generous because I don't really have visibility into half of the company). So, that means 99.7% of the company doesn't have the experience necessary to tell what good software development looks like.
The stuff the people using the AI are putting out is... better than what the MilOps analysts pressed into writing Python-scripts-with-delusions-of-grandeur were doing before, but by no means what I'd call quality software. I have pretty deep experience in both back end and front end. It's a step above "code written by smart people completely inexperienced in writing software that has to be maintained over a lifetime", but many steps below, "software that can successfully be maintained over a lifetime".
You can tweak the prompt a bit to skew the probability distribution with careful prompting (LLMs that are told to claim to be math PHDs are better at math problems, for instance), but in the end all of those weights in the model are spent to encode the most probable outputs.
So, it will be interesting to see how this plays out. If the average person using AI is able to produce above average code, then we could end up in a virtuous cycle where AI continuously improves with human help. On the other hand, if this just allows more low quality code to be written then the opposite happens and AI becomes more and more useless.
When it comes to software the entire reason maintainability is a goal is because writing and improving software is incredibly time consuming and requires a lot of skill. It requires so much skill and time that during my decades in industry I rarely found code I would consider quality. Furthermore the output from AI tools currently may have various drawbacks, but this technology is going to keep improving year over year for the foreseeable future.
These were maintainers of large open source projects. It's all relative. It's clearly providing massive gains for some and not as much for others. It should follow that it's benefit to you depends on who you are and what you are working on.
It isn't black and white.
There are some very good findings though, like how the devs thought they were sped up but they were actually slowed down.
I thought it was the model, but then I realised, v0 is carried by the shadcn UI library, not the intelligence of the model
Like what if by focusing on LLMs for productivity we just reinforce old-bad habits, and get into a local maxima... And even worse, what if being stuck with current so-so patterns, languages, etc means we don't innovate in language design, tooling, or other areas that might actually be productivity wins?
I expect it'll balance.
I guess the tricky bit is, nobody knows what the future looks like. "The internet is a fad" in 1999 hasn't aged well, but a lot of people touted 1960s AI, XML and 3d telivisions as things that'd be the tools in only a few years.
We're all just guessing till then.
Over IDK, 2-3 hours I got something that seemed on its face to work, but:
- it didn't use the pub/sub API correctly
- the 1 low-coverage test it generated didn't even compile (Go)
- there were a bunch of small errors it got confused by--particularly around closures
I got it to "90%" (again though it didn't at all work) with the first prompt, and then over something like a dozen more mostly got it to fix its own errors. But:
- I didn't know the pub/sub API--I was relying on Cursor to do this correctly--and it totally submarined me
- I had to do all the digging to get the test to compile
- I had to go line by line and tell it to rewrite... almost everything
I quit when I realized I was spending more time prompting it to fix things than it would take me to fully engage my brain and fix them myself. I also noticed that there was a strong pull to "just do one more prompt" rather than dig in and actually understand things. That's super problematic to me.
Worse, this wasn't actually faster. How do I know that? The next day I did what I normally do: read docs and wrote it myself. I spent less time (I'm a fast typist and a Vim user) overall, and my code works. My experience matches pretty well w/ the results of TFA.
---
Something I will say though is there is a lot of garbage stuff in tech. Like, I don't want to learn Terraform (again) just to figure out how to deploy things to production w/o paying a Heroku-like premium. Maybe I don't want to look up recursive CTEs again, or C function pointers, or spent 2 weeks researching a heisenbug I put into code for some silly reason AI would have caught immediately. I am _confident_ we can solve these things without boiling oceans to get AI to do it for us.
But all this shit about how "I'm 20x more productive" is totally absurd. The only evidence we have of this is people just saying it. I don't think a 20x productivity increase is even imaginable. Overall productivity since 1950 is up 3.6x [0]. These people are asking us to believe they've achieved over 400 years of productivity gains in "3 months". Extraordinary claims require extraordinary evidence. My guess is either you were extremely unproductive before, or (like others are saying in the threads) in very small ways you're 20x more productive but most things are unaffected or even slower.
Respectfully, this is user error.
They're not great at business logic though, especially if you're doing anything remotely novel. Which is the difficult part of programming anyway.
But yeah, to the average corporate programmer who needs to recreate the same internal business tool that every other company has anyway, it probably saves a lot of time.