I can’t imagine any other example where people voluntarily move for a black box approach.
Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.
What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.
Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?
Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.
There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.
The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.
The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.
But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!
For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.
(In the case of Windows 11 recently.. not so much ;)
Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.
Put another way, if you don't know what correct is before you start working then no tradeoff exists.
"Hey Claude, translate this piece of PHP code into Power10 assembly!"
I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.
> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly
This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.
Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?
If the app is too bright, you tweak the settings and build it again.
Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.
Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.
The output of code isn't just the code itself, it's the product. The code is a means to an end.
So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.
I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?
If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code
So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.
Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.
And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?
My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.
I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.
Specious analogies don't help anything.
It is right often enough that your time is better spent testing the functionality than the code.
Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).
Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.
Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.
>Reading and understanding the whole output of 9 concurrently running agents is impossible.
I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?
I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.
That's not a black box though. Someone is still reading the code.
> At some point you have to let go and trust people‘s judgement
Where's the people in this case?
> People who do that (I‘m not one of them btw) must rely on higher level reports.
Does such a thing exist here? Just "done".
But you are not. That’s the point?
> Where's the people in this case?
Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.
> Does such a thing exist here? Just "done".
Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.
I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.
To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.
My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.
I personally think we are not that far from it, but it will need something built on top of current CLI tools.
I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.
It's producing seemingly working code faster than you can closely review it.
The AI also writes the black box tests, what am I missing here?
If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.
In other words, if “the ai is checking as well” no one is.
etc...etc...
> In other words, if “the ai is checking as well” no one is.
"I tried nothing, and nothing at all worked!"
code is not the output. functionality is the output, and you do look at that.
Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.
Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.
There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?
Something I know very little about is coding. I know there are different languages with pros and cons to each. I know some work across operating systems while others don't but other than that I don't know too much.
For the first time I just started working on my own app in Codex and it feels absolutely amazing and magical. I've not seen the code, would have basically no idea how to read it, but i'm working on a niche application for my job that it is custom tailored to my needs and if it works I'll be thrilled. Even better is that the process of building is just feels so special and awesome.
This really does feel like it is on the precipice of something entirely different. I think back to computers before a GUI interface. I think back to even just computers before mobile touch interfaces. I am sure there are plenty of people who thought some of these things wouldn't work for different reasons but I think that is the wrong idea. The focus should be on who this will work for and why and there, I think, there are a ton of possibilities.
For reference, I'm a middle school Assistant Principal working on an app to help me with student scheduling.
Never thought this would be something people actually take seriously. It really makes me wonder if in 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.
The author of the article has a bachelor's degree in economics[1], worked as a product manager (not a dev) and only started using GitHub[2] in 2025 when they were laid off[3].
[1] https://www.linkedin.com/in/benshoemaker000/
You have to remember that the number of software developers saw a massive swell in the last 20 years, and many of these folks are Bootcamp-educated web/app dev types, not John Carmack. They typically started too late and for the wrong reasons to become very skilled in the craft by middle age, under pre-AI circumstances and statistically (of course there are many wonderful exceptions; one of my best developers is someone who worked in a retail store for 15 years before pivoting).
AI tools are now available to everyone, not just the developers who were already proficient at writing code. When you take in the excitement you always have to consider what it does for the average developer and also those below average: A chance to redefine yourself, be among the first doing a new thing, skip over many years of skill-building and, as many of them would put it, focus on results.
It's totally obvious why many leap at this, and it's even probably what they should do, individually. But it's a selfish concern, not a care for the practice as-is. It also results in a lot of performative blog posting. But if it was you, you might well do the same to get ahead in life. There's only to so many opportunities to get in on something on the ground floor.
I feel a lot of senior developers don't keep the demographics of our community of practice into account when they try to understand the reception of AI tools.
I have rarely had the words pulled out of my mouth.
The percentage of devs in my career that are from the same academic background, show similar interests, and approach the field in the same way, is probably less than %10, sadly.
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Notice "don't read the diffs anymore".
In fact, this is practically the anniversary of that tweet: https://x.com/karpathy/status/2019137879310836075?s=20
I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.
In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.
In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).
I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.
I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.
What is your opinion and did you vote this down because you think it's silly, dangerous or you don't agree?
That happens just as often without AI. Maybe the people that like it all thave experience with trashing multiple sets of products over the course of their life?
Sometimes it feels like pre-AI education is going to be like low-background steel for skilled employees.
We have been throwing away entire pieces of software forever. Where's Novell? Who runs 90s Linux kernels in prod?
Code isn't a bridge or car. Preservation isn't meaningful. If we aren't shutting the DCs off we're still burning the resources regardless if we save old code or not.
Most coders are so many layers of abstraction above the hardware at this point anyway they may as well consider themselves syntax artists as much as programmers, and think of Github as DeviantArt for syntax fetishists.
Am working on a model of /home to experiment with booting Linux to models. I can see a future where Python in my screen "runs" without an interpreter because the model is capable of correctly generating the appropriate output without one.
Code is ethno objects, only exists socially. It's not essential to computer operations. At the hardware level it's arithmetical operations against memory states.
Am working on my own "geometric primitives" models that know how to draw GUIs and 3D world primitives, text; think like "boot to blender". Rather store data in strings, will just scaffold out vectors to a running "desktop metaphor".
It's just electromagnetic geometry, delta sync between memory and display: https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...
I think 2-3 years is generous.
Don't get me wrong, I've definitely found huge productivity increases in using various LLM workflows in both development as well as operational things. But removing a human from the loop entirely at this point feels reckless bordering on negligent.
The only way to harness it is to somehow package code producing LLMs into an abstraction and then somehow validate the output. Until we achieve that, imo doesn't matter how closely people watch out the output, things will be getting worse.
Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.
(+) Except for making the UI beautiful. It's crap at that.
The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.
This extends up to higher level components where the spec also briefly explains the actual goal.
Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.
My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.
My overall stance on this is that it's better to lean into the models & the tools around them improving. Even in the last 3-4 months, the tools have come an incredible distance.
I bet some AI-generated code will need to be thrown away. But that's true of all code. The real questions to me are - are the velocity gains be worth it? Will the models be so much better in a year that they can fix those problems themselves, or re-write it?
I feel like time will validate that.
I can't imagine not reading the code I'm responsible for any more than I could imagine not looking out the windscreen in a self driving Tesla.
But if so many people are already there, and mostly highly skilled programmers imagine in 2 years time with people who've never programmed!
> For 12 years, I led data and analytics at Indeed - creating company-wide success metrics used in board meetings, scaling SMB products 6x, managing organizations of 70+ people.
He's a manager that made graphs on Power BI.
They're not here because they want to build things, they're here to shit a product out and make money. By the time Claude has stopped being able to pipe together ffmpeg commands or glue together 3 JS libraries, they've gone on to another project and whoever bought it is a sucker.
It's not that much different from the companies of the 2000s promising a 5th generation language with a UI builder that would fix everything.
And then, as a very last warning: the author of this piece sells AI consulting services. It's in his interest to make you believe everything he has to say about AI, because by God is there going to be suckers buying his time at indecently high prices to get shit advice. This sucker is most likely your boss, by the way.
What have you tried to use them for?
And for anything really serious? Opus 4.5 struggles to maintain a large-scale, clean architecture. And the resulting software is often really buggy.
Conclusion: if you want quality in anything big in February 2026, you still need to read the code.
That goes a bit against the article, but it's not reading code in the traditional sense where you are looking for common mistakes we humans tend to make. Instead you are looking for clues in the code to determine where you should improve in the docs and specs you fed into your agent, so the next time you run it chances are it'll produce better code, as the article suggests.
And I think this is good. In time, we are going to be forced to think less technically and more semantically.
Become a CTO, CEO or even a venture investor. "Here's $100K worth tokens, analyze market, review various proposals from Agents, invest tokens, maximize profit".
You know why not? Because it will be more obvious it doesn't work as advertised.
* AI can replace knowledge workers - most of existing software engineers, managers of all levels will loose their job and have to re-qualify.
* AI requires human in the loop.
In the first scenario, I see no reason to waste time and should start building plan B now (remaining job markets will be saturated at that point).
In the second scenario, tech-debt and zettabytes of slop will harm companies which relied on it heavily. In the age of failing giants and crumbling infrastructure, engineers and startups that can replace gigawatt burning data center with a few kilowatt rack, by manually coding a shell script that replaces Hadoop, will flourish.
Most probably it will be a spectrum - some roles can be replaced, some not.
You can guess what kind of software he is building.
When you read the 100th blog post about how AI is changing software development, just remember that these are the authors.
I'm mostly developing my own apps and working with startups.
The answer is clear: I didn’t write the code, I didn’t read it, I have no idea what it does, and that’s why it has a bug.
Which is perhaps what they should do, of course. Any transition is a chance to get ahead and redefine yourself.
Agents are a quality/velocity tradeoff (which is often good), if you can't debug stuff without them that's a problem as you'll get into holes, but that doesn't mean you have to write the code by hand.
Note though we're talking about "not reading code" in context, not the writing of it.
So basically a return to waterfall design.
Rather than YOLO planning (agile), we go back to YOLO implementation (farming it out to dozens of replaceable peons, but this time they're even worse).
So humble. Who is he again?
> Senior Technical Product Manager
yeah i'd wager they didn't read (let alone write) much code to begin with..
What it's trying to express is that the (T)PM job still should still be safe because they can just team-lead a dozen agents instead of software developers.
Take with a grain of salt when it comes to relevance for "coding", or the future role breakdown in tech organizations.
I'm not trying to express that my particular flavor of career is safe. I think that the ability to produce software is much less about the ability to hand-write code, and that's going to continue as the models and ecosystem improve, and I'm fascinated by where that goes.
Recently I picked a smallish task from our backlog. This is some code I'm not familiar with, frontend stuff I wouldn't tackle normally.
Claude wrote something. I tested, it didn't work. I explained the issue. It added a bunch of traces, asked me to collect the logs, figured out a fix, submitted the change.
Got bunch of linter errors that I don't understand, and that I copied and pasted to Claude. It fixed something, but still got lint errors, which Claude dismissed as irrelevant, but I realized I wasn't happy with the new behavior.
After 3 days of iteration, my change seems ok, passed the CI, the linters, and automatic review.
At that stage, I have no idea if this is the right way to fix the problem, and if it breaks something, I won't be able to fix it myself as I'm clueless. Also, it could be that a human reviewer tells me it's totally wrong, or ask me questions I won't be able to answer.
Not only, this process wasn't fun at all, but I also didn't learn anything, and I may introduce technical debt which AI may not be able to fix.
I agree that coding agents can boost efficiency in some cases, but I don't see a shift left of IDEs at that stage.
https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
Is it a nano banana tendency or was it probably intentional?
Here's the prompt I used, actually:
Create a vibrant, visually dynamic horizontal infographic showing the spectrum of AI developer tools, titled "The Shift Left"
Layout: 5 distinct zones flowing RIGHT TO LEFT as a journey/progression. Use creative visual metaphors — perhaps a road, river, pipeline, or abstract flowing shapes connecting the stages. Each zone should feel like its own world but connected to the others.
Zones (LEFT to RIGHT):
1. "Specs" (leftmost) - Kiro logo, VibeScaffold logo, GitHub Spec Kit logo
Label: "Requirements → Design → Tasks"
2. "Multi-Agent Orchestration" - Claude Code logo, Codex CLI logo, Codex App logo, Conductor logo Label: "Parallel agents, fire & forget"
3. "Agentic IDE" - Cursor logo, Windsurf logo Label: "Autonomous multi-file edits"
4. "Code + AI" - GitHub Copilot logo Label: "Inline suggestions"
5. "Code" (rightmost) - VS Code logo Label: "Read & write files"
Visual style: Fun, energetic, modern. Think illustrated tech landscape or isometric world. NOT a boring corporate chart. Use warm off-white background (#faf8f5) with amber/orange (#b45309) as the primary accent color throughout. Add visual flair — icons, small illustrations, depth, texture, but don't make it visually overloaded.Aspect ratio: 16:9 landscape
We are very far away from this being a settled or agreed upon statement and I really struggle to understand how one vendor making a tool is indicative of an industry practice.
You may read all the assembly that your compiler produces. (Which, awesome! Sounds like you have a fun job.) But I don't. I know how to read assembly and occasionally do it. But I do it rarely enough that I have to re-learn a bunch of stuff to solve the hairy bug or learn the interesting system-level thing that I'm trying to track down if I'm reading the output of the compiler. And mostly even when I have a bug down at the level where reading assembly might help, I'm using other tools at one or two removes to understand the code at that level.
I think it's pretty clear that "reading the code" is going to go the way of reading compiler output. And quite quickly. Even for critical production systems. LLMs are getting better at writing code very fast, and there's no obvious reason we'll hit a ceiling on that progress any time soon.
In a world where the LLMs are not just pretty good at writing some kinds of code, but very good at writing almost all kinds of code, it will be the same kind of waste of time to read source code as it is, today, to read assembly code.
Compilers predictably transform one kind of programming language code to CPU (or VM) instructions. Transpilers predictably transform one kind of programming language to another.
We introduced various instruction architectures, compiler flags, reproducible builds, checksums exactly to make sure that whatever build artifact that's produced is super predictable and dependable.
That reproducibility is how we can trust our software and that's why we don't need to care about assembly (or JVM etc.) specifics 99% of the time. (Heck, I'm not familiar with most of it.)
Same goes for libraries and frameworks. We can trust their abstractions because someone put years or decades into developing, testing and maintaining them and the community has audited them if they are open-source.
It takes a whole lot of hand-waving to traverse from this point to LLMs - which are stochastic by nature - transforming natural language instructions (even if you call it "specs", it's fundamentally still a text prompt!) to dependable code "that you don't need to read" i.e. a black box.
But with the AI tools we're not yet at the wave of "sometimes it's good to read the code" virtue signaling blog posts that will make front page next year or so, and still at the "I'm the new hot shit because I don't read code" moment, which is all a bit hard to take.
Even in those environments, I'd argue that AI coding can offer a lot in terms of verification & automated testing. However, I'd probably agree, in high-stakes safety environments, it's more of a 'yes and' than an either/or.
What’s worth paying for is something that is trustworthy.
Claude code is a perfect example: They blocked tools like opencode because they know quality is the only moat, and they don’t currently have it.
the constant asking drives me crazy
Also, the generated picture in this post makes me want to kick someone in the nuts. It doesn't explain anything.
Is the image really not that clear? There are IDE-like tools that all are focusing on different parts of the Spec --> Agent --> Code continuum. I think it illustrates that all right.
9/10 my ai generated code is bad before my verification layers 9/10 its good after.
Claude fights through your rules. And if you code in another language you could use other agents to verify code.
This is the challenge now, effectively verify the code. Whenever I end up with a bad response I ask myself what layers could i set to stop AI as early as possible.
Also things like namings, comments, tree traversal, context engineering, even data-structures, multi-agenting. I know it sounds like buzzword, but these are the topics a software-engineer really should think about. Everything else is frankly cope.