> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.
I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.
Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.
The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
Years ago I had to spend many months building nothing but Models (as in MVC) for a huge data import / ingest the company I worked on was rewriting. It was just messy enough that it couldn't be automated. I almost lost my mind from the dull monotony and started even having attendance issues. I know today that could have been done with an LLM in minutes. Almost crazy how much time I put into that project compared to if I did it today.
But a junior engineer would never find/anticipate those issues.
I am a bit concerned. Because the kind of software I am making, a llm would never prompt on its own. A junior cannot make it, it requires research and programming experience that they do not have. But I know that if I were a junior today, I would probably try to use llms as much as possible and would probably know less programming over time.
So it seems to me that we are likely to have worse software over time. Perhaps a boon for senior engineers but how do we train junior devs in that environment? Force them to build slowly, without llms? Is it aligned with business incentives?
Do we create APIs expecting the code to be generated by LLMs or written by hand? Because the impact of verbosity is not necessarily the same. LLMs don't get tired as fast as humans.
IMO, it's already happening. I had to change some personal information on a bunch of online services recently, and two out of seven of them were down. One of them is still down, a week later. This is the website of a major utilities company. When I call them, they acknowledge that it's down, but say my timing is just bad. That combined with all the recent outages has left me with the impression that software has been getting (even more) unreliable, recently.
So of course it’s going to generate code that has non-obvious bugs in it.
Ever play the Undefined Behaviour Game? Humans are bad at being compilers and catching mistakes.
I’d hoped… maybe still do, that the future of programming isn’t a shrug and, “good enough.” I hope we’ll keep developing languages and tools that let us better specify programs and optimize them.
Obviously if it's anything even minorly complex you can't trust the LLM hasn't found a new way to fool you.
It was only then did she introduce us to the glory that was Adobe Dreamweaver, which (obviously) increased our productivity tenfold.
I think what craftsmen miss is the different goals. Projects fall on a spectrum from long lived app that constantly evolve with a huge team working on it to not opened again after release. In the latter, like movie or music production (or most video games), only the end result matters, the how is not part of the final product. Working for years with designers and artists really gave me perspective on process vs end result and what matter.
That doesn’t mean the end result is messy or doesn’t have craftsmanship. Like if you call a general contractor or carpenter for a specific stuff, you care that the end result is well made, but if they tell you that they built a whole factory for your little custom made project (the equivalent of a nice codebase), not only it doesn’t matter for you but it’ll be wildly overpriced and delayed. In my agency that means the website is good looking and bug free after being built, no matter how messy is the temporary construction site.
In contrast if you work on a SaaS or a long lived project (e.g. an OS) the factory (the code) is the product.
So to me when people say they are into code craftsmanship I think they mean in reality they are more interested in factory building than end product crafting.
Say it takes 2 hours to implement a feature, and another hour making it logically/architecturally correct. You bill $600 and eat $200 for goodwill and your own personal/organizational development. You're still making $200/hr and you never find yourself in meetings with normie clients about why refactoring, cohesiveness, or quality was necessary.
I'm not saying they don't have their place, but without us they would still be making the world go round. Only backwards.
It’s lovely to have the time to do that. This time comes once the other type of engineer has shipped the product and turned the money flow on. Both types have their place.
* a bad craftsman will get pedantic about the wrong things (e.g. SOLID/DRY as dogma) and will create architectures that will make development velocity plummet ("clever" code, deep inheritance chains, "magic" code with lots of reflection etc.)
* a bad practician will not care about long term maintainability either, or even correctness enough not to introduce a bunch of bad bugs or slop, even worse when they're subtle enough to ship but mess up your schema or something
So you can have both good and bad outcomes with either, just for slightly different reasons (caring about the wrong stuff vs not caring).I think the sweet spot is to strive for code that is easy to read and understand, easy to change, and easy to eventually replace or throw out. Obviously performant enough but yadda yadda premature optimization, depends on the domain and so on...
The _vti_cnf dir left /etc/passwd downloadable, so I grabbed it from my school website. One Jack the Ripper later and the password was found.
I told the teacher resposible for the IT it was insecure and that ended up getting me some work experience. Ended up working the summer (waiting for my GCSE results) for ICL which immeasurably helped me when it was time to properly start working.
Did think about defacing, often wonder that things could have turned out very much differently!
I don’t particularly remember why, but “hand writing” fancy HTML and CSS used to be a flex in some circles in the 90s. A bunch of junk and stuff like fixed positioning in the source was the telltale sign they “cheated” with FrontPage or Dreamweaver lol
Dreamweaver was to web development what ...
I just sat here for 5 minutes and I wasn't able to finish that sentence. So I think that's a statement in itself.
People with very little competence could and did get things done, but it was a mess underneath.
https://developer.adobe.com/dreamweaver/
And yes, as you can imagine for the kind of comments I do regarding high level productive tooling and languages, I was a big Dreamwever fan back in the 2000's.
My first PHP scripts and games were written using nothing more than Notepad too funnily enough
Where it struggles: problems requiring taste or judgment without clear right answers. The LLM wants to satisfy you, which works great for 'make this exploit work' but less great for 'is this the right architectural approach?'
The craftsman answer might be: use LLMs for the systematic/tedious parts (code generation, pattern matching, boilerplate) while keeping human judgment for the parts that matter. Let the tool handle what it's good at, you handle what requires actual thinking.
This is a key difference. I've been writing software professionally for over two decades. It took me quite a long time to overcome certain invisible (to me) hesitations and objections to using LLMs in sdev workflows. At some point the realization came to me that this is simply the new way of doing things, and from this point onward, these tools will be deeply embedded in and synonymous with programming work. Recognizing this phenomenon for what it is somehow made me feel young again -- perhaps that's just the crust breaking around a calcified grump, but I do appreciate being able to tap into that all the same.
He surely has his fly closed when cutting through the hype with reflection and pragmatism (without the extreme positions on both sides often seen).
It's a brilliant skewering of the 'em dash means LLM' heuristic as a broken trick.
I wonder which of these camps are right.
For novices, LLMs are infinitely patient rubber ducks. They unstick the stuck; helping people past the coding and system management hurdles that once required deep dives through Stack Overflow and esoteric blog posts. When an explanation doesn’t land, they’ll reframe until one does. And because they’re confidently wrong often enough, learning to spot their errors becomes part of the curriculum.
For experienced engineers, they’re tireless boilerplate generators, dynamic linters, and a fresh set of eyes at 2am when no one else is around to ask. They handle the mechanical work so you can focus on the interesting problems.
The caveat for both: intentionality matters. They reward users who know what they’re looking for and punish those who outsource judgment entirely.
This gives me somewhat of a knee jerk reaction.
When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.
There was this long argument that you should know things and not have to look it up all the time. Altavista was a joke, and Google was cheating.
Then syntax highlighting came around and there'd always be a guy going "yeah nah, you shouldn't need syntax highlighting to program, you screen looks like a Christmas tree".
Then we got stuff like auto-complete, and it was amazing, the amount of keystrokes we saved. That too, was seen as heresy by the purists (followed later by LSP - which many today call heresy).
That reminds me also, back in the day, people would have entire Encyclopaedia on DVDs collections. Did they use it? No. But they criticised Wikipedia for being inferior. Look at today, though.
Same thing with LLMs. Whether you use them as a powerful context based auto-complete, as a research tool faster than wikipedia and google, as rubber-duck debugger, or as a text generator -- who cares: this is today, stop talking like a fossil.
It's 2025 and junior developers can't work without LSP and LLM? It's fine. They're not in front of a 386 DX33 with 1 book of K&R C and a blue EDIT screen. They have massive challenged ahead of them, the IT world is complete shambles, and it's impossible to decipher how anything is made, even open source.
Today is today. Use all the tools at hand. Don't shame kids for using the best tools.
We should be talking about sustainability of such tools rather than what it means to use them (cf. enshittification, open source models etc.)
The Internet itself is full of distractions. My younger self spent a crazy amount of time on IRC. So it's not different than spending time on say, Discord today.
LLMs have pretty much a direct relationship with Google. The quality of the response has much to do with the quality of the prompt. If anything, it's the overwhelming nature of LLMs that might be the problem. Back in the day, if you had, say a library access, the problem was knowing what to look for. Discoverability with LLMs is exponential.
As for LLM as auto-complete, there is an argument to be made that typing a lot reinforces knowledge in the human brain like writing. This is getting lost, but with productivity gains.
Tools like Claude code with ask/plan mode seem to be better in my experience, though I absolutely do wonder about the lack of typing causing a lack of memory formation
A rule I set myself a long time ago was to never copy paste code from stack overflow or similar websites. I always typed it out again. Slower, but I swear it built the comprehension I have today.
Nowadays I'm back to a text editor rather than an IDE, though fortunately one with much more creature comforts than n++ at least.
I'm glad I went down that path, though I can't say I'd really recommend as things felt a bit simpler back then.
That's not an LLM problem, they'd do the same thing 10 years ago with stack overflow: argue about which answer is best, or trust the answer blindly.
Normal auto complete plus a code tool like Claude Code or similar seem far more useful to me.
I have the same policy. I do the same thing for example code in the official documentation. I also put in a comment linking to the source if I end up using it. For me, it’s like the RFD says, it’s about taking responsibility for your output. Whether you originated it or not, you’re the reason it’s in the codebase now.
For interns/junior engineers, the choice is: comprehension VS career.
And I won't be surprised if most of them will go with career now, and comprehension.. well thanks maybe tomorrow (or never).
It shouldn't be, but it is.
Thats comparison undermines the integrity of the argument you are trying to make.
it isn't hilarious, it's true. My father (now in his 60s) who came from a blue collar background with very little education taught himself programming by manually copying and editing software out of magazines, like a lot of people his age.
I teach students now who have access to all the information in the world but a lot of them are quite literally so scatterbrained and heedless anything that isn't catered to them they can't process. Not having working focus and memory is like having muscle atrophy of the mind, you just turn into a vegetable. Professors across disciplines have seen decline in student abilities, and for several decades now, not just due to LLMs.
Reading books was never about knowledge. It was about knowhow. You didn't need to read all the books. Just some. I don't know how many developers I met who would keep asking questions that would be obvious to anyone who had read the book. They never got the big picture and just wasted everyone's time, including their own.
"To know everything, you must first know one thing."
But I mean, you can get by without memorizing stuff sure, but memorizing stuff does work out your brain and does help out in the long run? Isn't it possible we've reached the cliff of "helpful" tools to the point we are atrophying enough to be worse at our jobs?
Like, reading is surely better for the brain than watching TV. But constant cable TV wasn't enough to ruin our brains. What if we've got to the point it finally is enough?
> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.
Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.
Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".
I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.
The more examples of different types of problems being solved in similar ways present in an LLM's dataset, the better it gets at solving problems. Generally speaking, if it's a solution that works well, it gets used a lot, so "good solutions" become well represented in the dataset.
Human expression, however, is diverse by definition. The expression of the human experience is the expression of a data point on a statistical field with standard deviations the size of chasms. An expression of the mean (which is what an LLM does) goes against why we care about human expression in the first place. "Interesting" is a value closely paired with "different".
We value diversity of thought in expression, but we value efficiency of problem solving for code.
There is definitely an argument to be made that LLM usage fundamentally restrains an individual from solving unsolved problems. It also doesn't consider the question of "where do we get more data from".
>the code you actually want to ship is so far from what LLMs write
I think this is a fairly common consensus, and my understanding is the reason for this issue is limited context window.
1. Take every single function, even private ones.
2. Mock every argument and collaborator.
3. Call the function.
4. Assert the mocks were called in the expected way.
These tests help you find inadvertent changes, yes, but they also create constant noise about changes you intend.“Mock the world then test your mocks”, I’m simply not convinced these have any value at all after my nearly two decades of doing this professionally
It can be addressed with prompting, but you have to fight this constantly.
This is one of the problems I feel with LLM-generated code, as well. It's almost always between 5x and long and 20x (!) as long as it needs to be. Though in the case of code verbosity, it's usually not because of thoroughness so much as extremely bad style.
- I think the "if you use another model" rebuttal is becoming like the No True Scotsman of the LLM world. We can get concrete and discuss a specific model if need be.
- If the use case is "generate this function body for me", I agree that that's a pretty good use case. I've specifically seen problematic behavior for the other ways I'm seeing it OFTEN used, which is "write this feature for me", or trying to one shot too much functionality, where the LLM gets to touch data structures, abstractions, interface boundaries, etc.
- To analogize it to writing: They shouldn't/cannot write the whole book, they shouldn't/cannot write the table of contents, they cannot write a chapter, IMO even a paragraph is too much -- but if you write the first sentence and the last sentence of a paragraph, I think the interpolation can be a pretty reasonable starting point. Bringing it back to code for me means: function bodies are OK. Everything else gets questionable fast IME.
Basically if you are a software engineer you can very easily judge quality of code. But if you aren’t a writer then maybe it is hard for you to judge the quality of a piece of prose.
It depends on the LLM, I think. A lot of people have a bad impression of them as a result of using cheap or outdated LLMs.
A common prompt I use is approximately ”Write tests for file X, look at Y on how to setup mocks.”
This is probably not ”de novo” and in terms of writing is maybe closer to something like updating a case study powerpoint with the current customer’s data.
That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.
I wasn't trying to assert that LLMs can find all LLM-generated content (which feels tautologically impossible?), just that they are useful for the kind of LLM-generated content that we seek to detect.
[0] https://rfd.shared.oxide.computer/rfd/0003
[1] https://oxide.computer/careers
[2] https://oxide-and-friends.transistor.fm/episodes/ai-material...
This is or could be signal for a number of things, but what was particularly disappointing was the heavy emphasis on writing in the application packet and the company culture, as e.g., reiterated by the founder I'm replying to, and yet my writing samples were never even read? I have been in tech for many years, seen all the bullshit in recruiting, hiring, performed interviews many times myself, so it wouldn't be altogether surprising that a first line recruiter throws a resume into a reject pile for <insert reasons>, but then I have so many other questions - why the 3 months delay if tossed quickly, and if it truly was read by the/a founder or heavily scrutinized, as somewhat indicated by the post, why did they not access my writing samples? There are just more questions now. All of this was bothersome, and if I'm being honest, made me question joining the company, but what really made me write this response, is that I am now worried, given the content of the post I'm replying to, whether my application was flagged as LLM generated? I don't think my writing style is particularly LLMish, but in case that's in doubt, believe me or not, my application, and this response does not have a single word from an LLM. This is all, sui generis, me, myself, and I. (This doesn't quite explain why my samples weren't accessed, but if I'm being charitable, perhaps the content of the application packet seemed of dubious provenance?) Irregardless, if it was flagged, I suppose the long and short of this little story is: are you sending applicants rejection letters noting this suspicion, at least as a courtesy? If I was the victim of a false positive, I would at least like to know. This isn't some last ditch attempt (the rejection was many months ago) to get re-eval'd; I have a job, I can reapply in my own time, and even if this was an oversight or mistake (although not accessing the writing samples at all is somewhat of a red flag for me), there is no way they can contact me through this burner account, it's just, like, the principle of it, and the words needed to be said :) Thank you, and PS, even through it all, I (perhaps now guiltily) still love your podcast :D
I didn't test it and I'm far from an expert, maybe someone can challenge it?
It kinda works, but is not very reliable and is quite sensitive to which model the text was generated with.
This page has nice explanations:
https://www.pangram.com/blog/why-perplexity-and-burstiness-f...
My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:
1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project
2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.
3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.
4) I then tell it to generate the code
5) I skim & test the code to see if it's generally correct, and have it make corrections as needed
6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)
The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.
This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.
I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?
It's not magic though, this still takes some time to do.
Anything that involves math or complicated conditions I take extra time on.
I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence
1. Keeping the context very small 2. Keeping the scope of the output very small
With the added benefit of keeping you in the flow state (and in my experience making it more enjoyable).
To anyone that even hates LLMs give autocomplete a shot (with a keying to toggle it if it annoys you, sometimes it’s awful). It’s really no different than typing it manually wrt quality etc, so the speed up isn’t huge, but it feels a lot nicer.
I've seen LLMs write some really bad code a few times lately it seems almost worse than what they were doing 6 or 8 months ago. Could be my imagination but it seems that way.
Insert before that: have it creates tasks with beads and force it to let you review before marking a task complete
If you keep all edits to be driven by the LLM, you can use that knowledge later in the session or ask your model to commit the guidelines to long term memory.
Personally, I absolutely hate instructing agents to make corrections. It's like pushing a wet noodle. If there is lots to correct, fix one or two cases manually and tell the LLM to follow that pattern.
You obviously cannot emotionally identify with the code you produce this way; the ownership you might feel towards such code is nowhere near what meticulously hand-written code elicits.
That's not how it works. If you ask an LLM to write Harry Potter and it writes something that is 99% the same as Harry Potter, it isn't magically free of copyright. That would obviously be insane.
The legal system is still figuring out exactly what the rules are here but it seems likely that it's going to be on the LLM user to know if the output is protected by copyright. I imagine AI vendors will develop secondary search thingies to warn you (if they haven't already), and there will probably be some "reasonable belief" defence in the eventual laws.
Either way it definitely isn't as simple as "LLM wrote it so we can ignore copyright".
To me, this is what seems more insane! If you've never read Harry Potter, and you ask an LLM to write you a story about a wizard boy, and it outputs 80% Harry Potter - how would you even know?
> there will be probably be some "reasonable belief" defence in eventual laws.
This is probably true, but it's irksome to shift all blame away from the LLM producers, using copy-written data to peddle copy-written output. This simply turns the business into copyright infringement as a service - what incentive would they have to actually build those "secondary search thingies" and build them well?
> it definitely isn't as simple as "LLM wrote it so we can ignore copyright".
Agreed. The copyright system is getting stress tested. It will be interesting to see how our legal systems can adapt to this.
The obvious way is by searching the training data for close matches. LLMs need to do that and warn you about it. Of course the problem is they all trained on pirated books and then deleted them...
But either way it's kind of a "your problem" thing. You can't really just say "I invented this great tool and it sometimes lets me violate copyright without realising. You don't mind do you, copyright holders?"
(From what I understand, the amount of human input that's required to make the result copyrightable can be pretty small, perhaps even as little as selecting from multiple options. But this is likely to be quite a gray area.)
Shouldn't the right's extend forward and simply require the LLM code to be deleted?
Those kinds of cases, although they do happen, are exceptional. In a typical output that doesn't not line-for-line resemble a single training input, it is considered a new, but non-copyrightable work.
You should be careful about speaking in absolute terms when talking about copyright.
There is nothing that prevents multiple people from owning copyright to identical works. This is also why copyright infringement is such a mess to litigate.
I'd also be interested in knowing why you think code generated by LLMs can't be copyrighted. That's quite a statement.
There's also the problem with copyright law and different jurisdictions.
> Beats me. AI decided to do so and I didn't question it. I did ask AI to look at the OxCaml implementation in the beginning.
This shows that the problem with AI is philosophical, not practical
It seems more like a non experienced guy asked the LLM to implement something and the LLM just output what and experienced guy did before, and it even gave him the credit
(It is, of course, exceptionally lazy to leave such things in if you are using the LLM to assist you with a task, and can cause problems of false attribution. Especially in this case where it seems to have just picked a name of one of the maintainers of the project)
Note: I, myself, am guilty of forking projects, adding some simple feature I need with an LLM quickly because I don’t want to take the time to understand the codebase, and using it personally. I don’t attempt to upstream changes like this and waste maintainers’ time until I actually take the time myself to understand the project, the issue, and the solution.
It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.
I find two problems with this:
- there is incoherence there. If LLMs are flawless in reading and summarization, there is no difference with reading the original. And if they aren’t flawless, then that flaw also extends to non social stuff.
- in practice, I haven’t found LLMs so good as reading assistants. I’ve send them to check a linked doc and they’ve just read the index and inferred the context, for example. Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.
There is a significant risk in placing a translation layer between content and reader.
I think you got this backwards, because I don't think the RFD said that at all. The point was about a social expectation for writing, not for reading.
>using LLMs to assist comprehension should not substitute for actually reading a document where such reading is socially expected.
I would consider this a failure in their tool use capabilities, not their reading ones.
To use them to read things (without relying on their much less reliable tool use) take the thing and put it in the context window yourself.
They still aren't perfect of course, but they are reasonably good.
Three whole books likely exceeds their context window size of course, I'd take this as a sign that they aren't up to a task of that magnitude yet.
This was not “read all three books”, this was “check these three links with the (known) book synopsis/reviews there” and it made up the third one.
>I would consider this a failure in their tool use capabilities, not their reading ones.
Id give it to you if I got an error message, but the text being enhanced with wrong-but-plausible data is clearly a failure of reliability.
I think this points out a key point.. but I'm not sure the right way to articulate it.
A human-written comment may be worth something, but an LLM-generated is cheap/worthless.
The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".
It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.
Text, images, art, and music are all methods of expressing our internal ideas to other human beings. Our thoughts are the source, and these methods are how they are expressed. Our true goal in any form of communication is to understand the internal ideas of others.
An LLM expresses itself in all the same ways, but the source doesn't come from an individual - it comes from a giant dataset. This could be considered an expression of the aggregate thoughts of humanity, which is fine in some contexts (like retrieval of ideas and information highly represented in the data/world), but not when presented in a context of expressing the thoughts of an individual.
LLMs express the statistical summation of everyone's thoughts. It presents the mean, when what we're really interested in are the data points a couple standard deviations away from the mean. That's where all the interesting, unique, and thought provoking ideas are. Diversity is a core of the human experience.
---
An interesting paradox is the use of LLMs for translation into a non-native language. LLMs are actively being used to better express an individual's ideas using words better than they can with their limited language proficiency, but for those of us on the receiving end, we interpret the expression to mirror the source and have immediate suspicions on the legitimacy of the individual's thoughts. Which is a little unfortunate for those who just want to express themselves better.
A comment is an attempt to more fully document the theory the programmer has. Not all theory can be expressed in code. Both code and comment are lossy artefacts that are "projections" of the theory into text.
LLMs currently, I believe, cannot have a theory of the program. But they can definitely perform a useful simulacrum of such. I have not yet seen an LLM generated comment that is truly valuable. Of course, lots of human generated comments are not valuable either. But the ceiling for human comments is much, much higher.
For example, I recently was perusing the /r/SaaS subreddit and could tell that most of the submissions were obviously LLM-generated, but often by telling a story that was meant to spark outrage, resonate with the “audience” (eg being doubted and later proven right), and ultimately conclude by validating them by making the kind of decision they typically would.
I also would never pass this off as anything else, but I’ve been finding it effective to have LLMs write certain kinds of documentation or benchmarks in my repos, just so that they/I/someone else have access to metrics and code snippets that I would otherwise not have time to write myself. I’ve seen non-native English speakers write pretty technically useful/interesting docs and tech articles by translating through LLMs too, though a lot more bad attempts than good (and you might not be able to tell if you can’t speak the language)…
Honestly the lines are starting to blur ever so slightly for me, I’d still not want someone using an LLM to chat with me directly, but if someone who could have an LLM build a simple WASM/interesting game and then write an interesting/informative/useful article about it, or steer it into doing so… I might actually enjoy it. And not because the prompt was good: instructions telling an LLM to go make a game and do a write up don’t help me as much or in the same way as being able to quickly see how well it went and any useful takeaways/tricks/gotchas it uncovered. It would genuinely be giving me valuable information and probably wouldn’t be something I’d speculatively try or run myself.
That’s what I think when I see a news headline. What are you writing? Who cares. WHY are you writing it — that is what I want to know.
They seem to be good at either spitting out something very average, or something completely insane. But something genuinely indicative of the spark of intelligence isn’t common at all. I’m happy to know that while my thoughts are likely not original, they are at least not statistically likely.
This is a technical document that is useful in illustrating how the guy who gave a talk once that I didn’t understand but was captivated by and is well-respected in his field intends to guide his company’s use of the technology so that other companies and individual programmers may learn from it too.
I don’t think the objective was to take any outright ethical stance, but to provide guidance about something ostensibly used at an employee’s discretion.
> First, to those who can recognize an LLM’s reveals (an expanding demographic!), it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open. But there are deeper problems: LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well. If the prose is automatically generated, might the ideas be too? The reader can’t be sure — and increasingly, the hallmarks of LLM generation cause readers to turn off (or worse).
> Specifically, we must be careful to not use LLMs in such a way as to undermine the trust that we have in one another
> our writing is an important vessel for building trust — and that trust can be quickly eroded if we are not speaking with our own voice
This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).
Reading good code can be a better way to learn about something than reading prose. Writing code like that takes some real skill and insight, just like writing clear explanations.
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
believing this in 2025 is really fascinating. this is like believing Meta won’t use info they (i)legally collected about you to serve you ads
I think the review by the prompt writer should be at a higher level than another person who reviews the code.
If I know how to do something, it is easier for me to avoid mistakes while doing it. When I'm reviewing it it requires different pathways in my brain. Since there is code out there I'm drawn to that path, and I might not not always spot the problem points. Or code might be written in a way that I don't recognize, but still exhibits the same mistake.
In the past, as a reviewer I used to be able to count on my colleagues' professionalism to be a moat.
The size of the moat is inverse to the amount of LLM generated code in a PR / project. At a certain moment you can no longer guarantee that you stand behind everything.
Combine that with the push to do more faster, with less, meaning we're increasing the amount of tech debt we're taking on.
Is there any evidence for this?
Yes, allow the use of LLMs, encourage your employees to use them to move faster by rewarding "performance" regardless of risks, but make sure to place responsibility of failure upon them so that when it happens, the company culture should not be blamed.
<offtopic> The "RFD" here stands for "Reason/Request for Decision" or something else? (Request for Decision doesn't have a nice _ring_ on it tbh). I'm aware of RFCs ofc and the respective status changes (draft, review, accepted, rejected) or ADR (Architectural Decision Record) but have not come across the RFD acronym. Google gave several different answers. </offtopic> </offtopic>
it's just utterly hopeless how bad they are at doing it
even if I break it down into parts once you get into the stuff that actually matters i.e. the physics, event handling, and game logic induced by events, it just completely falls apart 100% of the time
These things aren't hard if you're familiar with the documentation and have made them before, but what there is is an extreme dearth of information about it compared to web dev tutorials.
Actually a lot people dispute this and I'm sure the author knows that!
Doesn't mean you shouldn't ever do so, but there are tradeoffs that become obvious as soon as you start attempting it.
To extend that: If the LLM is the author and the responsible engineer is the genuine first reviewer, do you need a second engineer at all?
Typically in my experience one review is enough.
Why introduce a second reviewer and reduce the rumoured velocity gained by LLMs? After all, “it doesn’t matter what wrote the code” right.
I say let her rip. Or as the kids say, code goes brrr.
anyone who is doing serious enough engineering that they have the rule of "one human writes, one human reviews" wants two humans to actually put careful thought in to a thing, and only one of them is deeply incentivised to just commit the code.
your suggestion means less review and worse incentives.
[1] https://github.com/oxidecomputer/meta/tree/master/engineerin...
To be honest there's really no secret sauce in there. It's primarily how to get started with agents, when to abandon your context and start anew, and advice on models, monitoring cost, and prompting. This is not to diminish the value of the information as it's good information written by great colleagues. I just wanted to note that most of the information can be obtained from the official AI provider documentation and blog posts from AI boosters like Thorsten Ball.
By this own article's standards, now there are 2 authors who don't understand what they've produced.
I couldn't disagree more. (In fact I'm shocked that Bryan Cantrill uses words like "comprehension" and "meaningfully" in relation to LLMs.)
Summaries provided by ChatGPT, conclusions drawn by it, contain exaggerations and half-truths that are NOT there in the actual original sources, if you bother enough to ask ChatGPT for those, and to read them yourself. If your question is only slightly suggestive, ChatGPT's tuning is all too happy to tilt the summary in your favor; it tells you what you seem to want to hear, based on the phrasing of your prompt. ChatGPT presents, using confident and authoritative language, total falsehoods and deceptive half-truths, after parsing human-written originals, be the latter natural language text, or source code. I now only trust ChatGPT to recommend sources to me, and I read those -- especially the relevant-looking parts -- myself. ChatGPT has been tuned by its masters to be a lying sack of shit.
I've recently asked ChatGPT a factual question: I asked it about the identity of a public figure (an artist) whom I had seen in a video on youtube. ChatGPT answered with "Person X", and even explained why Person X's contribution was so great to the piece of art in question. I knew the answer was wrong, so I retorted only with: "Source?". Then ChatGPT apologized, and did the exact same thing, just with "Person Y"; again explaining why Person Y was so influental in making that piece of art so great. I knew the answer was wrong still, so I again said: "Source?". And at third attempt, ChatGPT finally said "Person Z", with a verifiable reference to a human-written document that identified the artist.
FUCK ChatGPT.
The question is not whether one (1) LLM can replace one (1) expert.
Rather, it is how much farther an expert can get through better tooling. In my experience, it can be pretty far indeed.
If you hand me a financial report, I expect you used Excel or a calculator. I don't feel cheated that you didn't do long division by hand to prove your understanding. Writing is no different. The value isn't in how much you sweated while producing it. The value is in how clear the final output is.
Human communication is lossy. I think X, I write X' (because I'm imperfect), you understand Y. This is where so many misunderstandings and workplace conflicts come from. People overestimate how clear they are. LLMs help reduce that gap. They remove ambiguity, clean up grammar, and strip away the accidental noise that gets in the way of the actual point.
Ultimately, outside of fiction and poetry, writing is data transmission. I don't need to know that the writer struggled with the text. I need to understand the point clearly, quickly, and without friction. Using a tool that delivers that is the highest form of respect for the reader.
I would extend the argument further to say it applies to lots of human generated content as well. Especially sales and marketing information which similarly elicit very low trust.
Clarity is useless if it's inaccurate.
Excel is deterministic. ChatGPT isn't.
I’ve come across a lot of people recently online expressing anger and revulsion at any images or artwork that have been created by genAI.
For relatively mundane purposes, like marketing materials, or diagrams, or the sort of images that would anyway be sourced from a low-cost image library, I don’t think there’s an inherent value to the “art”, and don’t see any problem with such things being created via genAI.
Possible consequences:
1) Yes, this will likely lead to loss/shifts in employment, but wasn’t progress ever like this? People have historically reacted strongly against many such shifts when advancing technology threatens some sector, but somehow we always figure it out and move on.
2) For genuine art, I suspect this will in time lead to a greater value being placed in demonstrably human-created originals. Related, there’s probably of money to be made by whoever can create a trusted system somehow capturing proof of human work, in a way that can’t be cheated or faked.
At this point, who really cares what the person who sees everything as "AI slop" thinks?
I would rather just interact with Gemini anyway. I don't need to read/listen to the "AI slop hunter" regurgitate their social media feed and NY Times headlines back to me like a bad language model.
This probably doesn't give them enough credit. If you can feed an LLM a list of crash dumps it can do a remarkable job producing both analyses and fixes. And I don't mean just for super obvious crashes. I was most impressed with a deadlock where numerous engineers and tried and failed to understand exactly how to fix it.
This is sort of the opposite of vibe coding, but LLMs are OK at that too.
Oooo I like that. Will try and remember that one.
Amusingly, my experience is that the longer an issue takes me to debug the simpler and dumber the fix is. It's tragic really.
Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.
I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".
My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.
I'm currently trying out using Opus 4.5 to take care of a gnarly code reorganization that would take a human most of a week to do -- I spent a day writing a spec (by hand, with some editing advice from Claude Code), having it reviewed as a document for humans by humans, and feeding it into Opus 4.5 on some test cases. It seems to work well. The spec is, of course, in the form of an RFD, which I hope to make public soon.
I like to think of the spec is basically an extremely advanced sed script described in ~1000 English words.
There are things in life that have high risks of harm if misused yet people still use them because there are great benefits when carefully used. Being aware of the risks is the key to using something that can be harmful, safely.
The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".
The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.
Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.
I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.
THOU SHALT OWN THE CODE THAT THOU DOST RENDER.
All other values should flow from that, regardless of whether the code itself is written by you or AI or by your dog. If you look at the values in the article, they make sense even without LLMs in the picture.
The source of workslop is not AI, it's a lack of ownership. This is especially true for Open Source projects, which are seeing a wave of AI slop PR's precisely because the onus of ownership is largely on the maintainers and not the upstart "contributors."
Note also that this does not imply a universal set of values. Different organizations may well have different values for what ownership of code means -- E.g. in the "move fast, break things" era of FaceBook, workslop may have been perfectly fine for Zuck! (I'd bet it may even have hastened the era of "Move fast with stable infrastructure.") But those values must be consistently applied regardless of how the code came to be.
Maybe for simple braindead tasks you can do yourself anyway.
Try doing it on something actually hard or complex and they get it wrong 100/100 if they don't have adequate training data, and 90/100 if they do.
He is a long way from Sun.
What about Oxide? Oxide is funded by Eclipse ventures, which now installed a Trump friendly person:
https://www.reuters.com/business/finance/vc-firm-eclipse-tap...
> Sorry, not interested in trivial changes like that.
- bnoordhuis
As a not native English speaker, I think the change itself is okay (women will also occasionally use computers), but saying you're not interested in merging it is kinda cringe, for a lack of a better term - do you not realize that people will take issue with this and you're turning a trivial change into a messy discussion? Stop being a nerd and merge the damn changeset, it won't break anything either, read the room. Admittedly, I also view the people arguing in the thread to be similarly cringe, purely on the basis that if someone is uninterested/opposed to stuff like this, you are exceedingly unlikely to be able to make them care.
Feels the same as how allowlist/denylist reads more cleanly, as well as main for a branch name uses a very common word as well - as long as updating your CI config isn't too much work. To show a bit of empathy the other way as well, maybe people get tired of too many changes like that (e.g. if most of the stuff you review is just people poking the docs by rewording stuff to be able to say that they contributed to project X). Or maybe people love to take principled stances and to argue idk
> ...it’s not the use of the gendered pronoun that’s at issue (that’s just sloppy), but rather the insistence that pronouns should in fact be gendered.
Yeah, odd thing to get so fixated on when the they/them version is more accurate in this circumstance. While I don't cause drama when I see gendered ones (again, most people here have English as a second language), I wouldn't argue with someone a bunch if they wanted to correct the docs or whatever.
https://landley.net/history/mirror/linux/kissedagirl.html
He wasn't fired or canceled. It is great to see Gen-Xers and Boomers having all the fun in the 1980s and 1990s and then going all prissy on younger people in the 2010s and trying to ruin their careers.
> This can be extraordinarily powerful for summarizing documents — or of answering more specific questions of a large document like a datasheet or specification.
That dash shouldn't be there. That's not a parenthetical clause, that's an element in a list separated by "or." You can just remove the dash and the sentence becomes more correct.
British users regularly use that sort of construct with "-" hyphens, simply because they're pretty much the same and a whole lot easier to type on a keyboard.
"lack of conviction" would be a useful LLM metric.
I was hoping he'd make the leaderboard, but perhaps the addiction took proper hold in more recent years:
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
https://news.ycombinator.com/user?id=bcantrill
No doubt his em dashes are legit, of course.
However, I was surprised to see that when someone (not me) accused him of using an LLM to write his comment, he flatly denied it: https://news.ycombinator.com/item?id=46011964
Which I guess means (assuming he isn't lying) if you spend too much time interacting with LLMs, you eventually resemble one.
Pretty much. I think people who care about reducing their children's exposure to screen time should probably take care to do the same for themselves wrt LLMs.
1. Because reading posts like this 2. Is actually frustrating as hell 3. When everything gets dragged around and filled with useless anecdotes and 3 adjective mumbojumbos and endless emdashes — because somehow it's better than actually just writing something up.
Which just means that people in tech or in general have no understanding what an editor does.
https://www.nobelprize.org/prizes/physics/1965/feynman/speec...
Are you speaking about words like “shall”? I didn’t notice them, but In RFCs those are technical terms which carry precise meaning.
Using Large Language Models (LLMs) at Oxide
This document explains how we should think about using LLMs (like ChatGPT or similar tools) at Oxide.
What are LLMs?
LLMs are very advanced computer programs that can understand and generate text. They've become a big deal in the last five years and can change how we work. But, like any powerful tool, they have good and bad sides. They are very flexible, so it’s hard to give strict rules about how to use them. Still, because they are changing so fast, we need to think carefully about when and how we use them at Oxide.
What is Important When Using LLMs
We believe using LLMs should follow our core values:
Responsibility:
We are responsible for the work we produce. Even if we use an LLM to help, a human must make the final decisions. The person using the LLM is responsible for what comes out.
Rigor (Care and Precision):
LLMs can help us think better or find mistakes, but if we use them carelessly, they can cause confusion. We should use them to improve our work, not to cut corners.
Empathy:
Remember, real people read and write what we produce. We should be kind and respectful in our language, whether we are writing ourselves or letting an LLM help.
Teamwork:
We work as a team. Using LLMs should not break trust among team members. If we tell others we used an LLM, it might seem like we’re avoiding responsibility, which can hurt trust.
Urgency (Doing Things Quickly):
LLMs can help us work faster, but we shouldn’t rush so much that we forget responsibility, care, and teamwork. Speed is good, but not at the cost of quality and trust.
How We Use LLMs
LLMs can be used in many ways. Here are some common uses:
1. As Readers
LLMs are great at quickly understanding documents, summaries, or answering questions about texts.
Important: When sharing documents with an LLM, make sure your data is private. Also, remember that uploading files might allow the LLM to learn from your data unless you turn that off.
Note: Use LLMs to help understand documents, but don’t skip reading them yourself. LLMs are tools, not replacements for reading carefully.
2. As Editors
LLMs can give helpful feedback on writing, especially after you’ve written a draft. They can suggest improvements in structure and wording.
Caution: Sometimes, LLMs may flatter your work too much or change your style if used too early. Use them after you’ve done some work yourself.
3. As Writers
LLMs can write text, but their writing can be basic or obvious. Sometimes, they produce text that shows it was made by a machine.
Why be careful? If readers see that the writing is from an LLM, they might think the author didn’t put in enough effort or don’t truly understand the ideas.
Our rule: Usually don’t let LLMs write your final drafts. Use them to help, but own your words and ideas.
4. As Code Reviewers
LLMs can review code and find problems, but they can also miss issues or give bad advice. Use them as a helper, not a replacement for human review.
5. As Debuggers
LLMs can sometimes help find solutions to tricky problems. They might give helpful hints. But don’t rely on them too much—use them as a second opinion.
6. As Programmers
LLMs are very good at writing code, especially simple or experimental code. They can be useful for quick tasks like writing tests or prototypes.
Important: When an LLM writes code, the person responsible must review it carefully. Responsibility for the code stays with the human.
Teamwork: If you use an LLM to generate code, make sure you understand and review it yourself first.
How to Use LLMs Properly
There are detailed guidelines and tips in the internal document called "LLMs at Oxide."
In general:
Using LLMs is encouraged, but always remember your responsibilities—to your product, your customers, and your team.