A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:

> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.

I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.

Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.

The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

  • pests
  • ·
  • 20 hours ago
  • ·
  • [ - ]
> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

Years ago I had to spend many months building nothing but Models (as in MVC) for a huge data import / ingest the company I worked on was rewriting. It was just messy enough that it couldn't be automated. I almost lost my mind from the dull monotony and started even having attendance issues. I know today that could have been done with an LLM in minutes. Almost crazy how much time I put into that project compared to if I did it today.

The issue is that it might look good but an LLM often inserts weird mistakes. Or ellipses. Or overindex on the training data. If someone is not careful it is easy to completely wreck the codebase by piling on seemingly innocuous commits. So far I have developed a good sense for when I need to push the llm to avoid sloppy code. It is all in the details.

But a junior engineer would never find/anticipate those issues.

I am a bit concerned. Because the kind of software I am making, a llm would never prompt on its own. A junior cannot make it, it requires research and programming experience that they do not have. But I know that if I were a junior today, I would probably try to use llms as much as possible and would probably know less programming over time.

So it seems to me that we are likely to have worse software over time. Perhaps a boon for senior engineers but how do we train junior devs in that environment? Force them to build slowly, without llms? Is it aligned with business incentives?

Do we create APIs expecting the code to be generated by LLMs or written by hand? Because the impact of verbosity is not necessarily the same. LLMs don't get tired as fast as humans.

> So it seems to me that we are likely to have worse software over time.

IMO, it's already happening. I had to change some personal information on a bunch of online services recently, and two out of seven of them were down. One of them is still down, a week later. This is the website of a major utilities company. When I call them, they acknowledge that it's down, but say my timing is just bad. That combined with all the recent outages has left me with the impression that software has been getting (even more) unreliable, recently.

They are trained on code people had to make sacrifices for: deadlines, shortcuts, etc. And code people were simply too ignorant to be writing in the first place. Lots of code with hardly any coding standards.

So of course it’s going to generate code that has non-obvious bugs in it.

Ever play the Undefined Behaviour Game? Humans are bad at being compilers and catching mistakes.

I’d hoped… maybe still do, that the future of programming isn’t a shrug and, “good enough.” I hope we’ll keep developing languages and tools that let us better specify programs and optimize them.

If it's such a mind numbing problem it's easy to check it though, and the checking you do after the LLM will be much smaller than you writing every field (implicitly "checking" it when you write it).

Obviously if it's anything even minorly complex you can't trust the LLM hasn't found a new way to fool you.

  • pests
  • ·
  • 38 minutes ago
  • ·
  • [ - ]
This is exactly it. There wasn't any complex logic. Just making sure the right fields were mapped, some renaming, and sometimes some more complex joins depending on the incoming data source and how it was represented (say multiple duplicate rows or a single field with comma delimited id's from somewhere else). I would have much rather scanned the LLM output line by line (and most would be simple, not very indented) then hand writing from scratch. I do admit it would take some time to review and cross reference, but I have no doubt it would have been a fraction of the time and effort.
  • ·
  • 15 hours ago
  • ·
  • [ - ]
I remember in the very first class I ever took on Web Design the teacher spent an entire semester teaching "first principles" of HTML, CSS and JavaScript by writing it in Notepad.

It was only then did she introduce us to the glory that was Adobe Dreamweaver, which (obviously) increased our productivity tenfold.

DreamWeaver absolutely destroyed the code with all kinds of tags and unnecessary stuff. Especially if you used the visual editor. It was fun for brainstorming but plain notepad with clean understandable code was far far better (and with the browser compatibility issues the only option if you were going to production).
After 25 or so years doing this, I think there are two kinds of developers: craftsmen and practical “does it get the job done” types. I’m the former. The latter seem to be what makes the world go round.
I am both, I own a small agency when I have to be practical, and have fun crafting code on the hobby side.

I think what craftsmen miss is the different goals. Projects fall on a spectrum from long lived app that constantly evolve with a huge team working on it to not opened again after release. In the latter, like movie or music production (or most video games), only the end result matters, the how is not part of the final product. Working for years with designers and artists really gave me perspective on process vs end result and what matter.

That doesn’t mean the end result is messy or doesn’t have craftsmanship. Like if you call a general contractor or carpenter for a specific stuff, you care that the end result is well made, but if they tell you that they built a whole factory for your little custom made project (the equivalent of a nice codebase), not only it doesn’t matter for you but it’ll be wildly overpriced and delayed. In my agency that means the website is good looking and bug free after being built, no matter how messy is the temporary construction site.

In contrast if you work on a SaaS or a long lived project (e.g. an OS) the factory (the code) is the product.

So to me when people say they are into code craftsmanship I think they mean in reality they are more interested in factory building than end product crafting.

I also do third party software development, and my approach is always: bill (highly, $300+/hr) for the features and requirements, but do the manual refactoring and architecture/performance/detail work on your own time. It benefits you, it benefits the client, it benefits the relationship, and it handles the misunderstanding of your normie clients with regard to what constitutes "working".

Say it takes 2 hours to implement a feature, and another hour making it logically/architecturally correct. You bill $600 and eat $200 for goodwill and your own personal/organizational development. You're still making $200/hr and you never find yourself in meetings with normie clients about why refactoring, cohesiveness, or quality was necessary.

I agree wholeheartedly. As for the why do craftsmen care so much about the factory instead of the product, I believe the answer is pride. It’s a bitter pill to swallow, but writing and shipping a hack is sometimes the high road
If you've been doing it for that long (about as long as I have), then surely you remember all the times you had to clean up after the "git 'er done" types.

I'm not saying they don't have their place, but without us they would still be making the world go round. Only backwards.

I work in digital forensics and incident response. The “git ‘er done” software engineers have paid my mortgage and are putting my kids through private schooling.
> all the times you had to clean up after the "git 'er done" types

It’s lovely to have the time to do that. This time comes once the other type of engineer has shipped the product and turned the money flow on. Both types have their place.

  • ·
  • 10 hours ago
  • ·
  • [ - ]
Well, going round in a circle does project to going forwards then backwards in a line :)
I think there's more dimensions that also matter a bunch:

  * a bad craftsman will get pedantic about the wrong things (e.g. SOLID/DRY as dogma) and will create architectures that will make development velocity plummet ("clever" code, deep inheritance chains, "magic" code with lots of reflection etc.)
  * a bad practician will not care about long term maintainability either, or even correctness enough not to introduce a bunch of bad bugs or slop, even worse when they're subtle enough to ship but mess up your schema or something
So you can have both good and bad outcomes with either, just for slightly different reasons (caring about the wrong stuff vs not caring).

I think the sweet spot is to strive for code that is easy to read and understand, easy to change, and easy to eventually replace or throw out. Obviously performant enough but yadda yadda premature optimization, depends on the domain and so on...

After becoming a founder and having to deal with my own code for a decade, I’ve learned a balance. Prototype fast with AI crap to get the insight than write slow with structure for stuff that goes to production. AI does not touch production code - ask when needed to fix a tiny bit, but keep the beast at arms distance.
It takes both.
The HTML generated by Dreamweaver's WYSIWYG mode might not have been ideal, but it was far superior to the mess produced by MS Front Page. With Dreamweave, it was at least possible to use it as a starting point.
Judicious and careful use of Dreamweaver (its visual editor and properties bar) enabled me to write exactly the code I wanted. I used Dreamweaver foot table layouts and Home Site (later Top Style) for broader code edits. At that time I was famous with the company for being able to make any layout. Good times!
MS FrontPage also went out of its way to do the same.
  • _joel
  • ·
  • 13 hours ago
  • ·
  • [ - ]
It might have been pretty horrible but I hold Frontpage 97 with fond memories, it started my IT career, although not for HTML reasons.

The _vti_cnf dir left /etc/passwd downloadable, so I grabbed it from my school website. One Jack the Ripper later and the password was found.

I told the teacher resposible for the IT it was insecure and that ended up getting me some work experience. Ended up working the summer (waiting for my GCSE results) for ICL which immeasurably helped me when it was time to properly start working.

Did think about defacing, often wonder that things could have turned out very much differently!

  • pram
  • ·
  • 15 hours ago
  • ·
  • [ - ]
It’s funny this came up, because it was kinda similar to the whole “AI frauds” thing these days.

I don’t particularly remember why, but “hand writing” fancy HTML and CSS used to be a flex in some circles in the 90s. A bunch of junk and stuff like fixed positioning in the source was the telltale sign they “cheated” with FrontPage or Dreamweaver lol

My only gripe was that they tended to generate gobs of “unsemantic” HTML. You resized a table and expect it to be based on viewport width? No! It’s hardcoded “width: X px” to whatever your size the viewport was set to.
> glory that was Adobe Dreamweaver

Dreamweaver was to web development what ...

I just sat here for 5 minutes and I wasn't able to finish that sentence. So I think that's a statement in itself.

  • ·
  • 12 hours ago
  • ·
  • [ - ]
..VB6 was to windows dev?

People with very little competence could and did get things done, but it was a mess underneath.

  • pjmlp
  • ·
  • 14 hours ago
  • ·
  • [ - ]
I love how people speak about Dreamweaver in the past, while Adobe keeps getting money for it,

https://developer.adobe.com/dreamweaver/

And yes, as you can imagine for the kind of comments I do regarding high level productive tooling and languages, I was a big Dreamwever fan back in the 2000's.

  • girvo
  • ·
  • 19 hours ago
  • ·
  • [ - ]
I miss Dreamweaver. Combining it with Fireworks was a crazy productive combo for me back in the mid 00’s!

My first PHP scripts and games were written using nothing more than Notepad too funnily enough

  • panzi
  • ·
  • 17 hours ago
  • ·
  • [ - ]
Back in the early 00s I brought gvim.exe on a floppy disk to school because I refused to write XSLT, HTML, CSS, etc without auto-indent or syntax highlighting.
The craft vs practical tension with LLMs is interesting. We've found LLMs excel when there's a clear validation mechanism - for security research, the POC either works or it doesn't. The LLM can iterate rapidly because success is unambiguous.

Where it struggles: problems requiring taste or judgment without clear right answers. The LLM wants to satisfy you, which works great for 'make this exploit work' but less great for 'is this the right architectural approach?'

The craftsman answer might be: use LLMs for the systematic/tedious parts (code generation, pattern matching, boilerplate) while keeping human judgment for the parts that matter. Let the tool handle what it's good at, you handle what requires actual thinking.

> The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so

This is a key difference. I've been writing software professionally for over two decades. It took me quite a long time to overcome certain invisible (to me) hesitations and objections to using LLMs in sdev workflows. At some point the realization came to me that this is simply the new way of doing things, and from this point onward, these tools will be deeply embedded in and synonymous with programming work. Recognizing this phenomenon for what it is somehow made me feel young again -- perhaps that's just the crust breaking around a calcified grump, but I do appreciate being able to tap into that all the same.

For the other non-native speakers wondering, "fly" means your trouser zipper.

He surely has his fly closed when cutting through the hype with reflection and pragmatism (without the extreme positions on both sides often seen).

I was also confused when I read that sentence. Wikipedia has an article on it: https://en.wikipedia.org/wiki/Fly_(clothing)
I found it funny that in a sentence that mentions "those who can recognize an LLM’s reveals", a few words later, there's an em-dash. I've often used em-dashes myself, so I find it a bit annoying that use of em-dashes is widely considered to be an AI tell.
The em-dash alone is not an LLM-reveal -- it's how the em-dash is used to pace a sentence. In my experience, with an LLM, em-dashes are used to even pacing; for humans (and certainly, for me!), the em-dash is used to deliberately change pacing -- to introduce a pause (like that one!), followed by a bit of a (metaphorical) punch. The goal is to have you read the sentence as I would read it -- and I think if you have heard me speak, you can hear me in my writing.
Too much has been written about em-dashes and LLMs, but I'd highly recommend If it cites em dashes as proof, it came from a tool from Scott Smitelli if you haven't read it.

It's a brilliant skewering of the 'em dash means LLM' heuristic as a broken trick.

1. https://www.scottsmitelli.com/articles/em-dash-tool/

It's funny that I've seen people both argue that LLMs are exclusively useful only to beginners who know next to nothing and also that they are only useful if you are a 50+ YoE veteran at the top of their craft who started programming with punch cards since they were 5-years-old.

I wonder which of these camps are right.

Both camps, for different reasons.

For novices, LLMs are infinitely patient rubber ducks. They unstick the stuck; helping people past the coding and system management hurdles that once required deep dives through Stack Overflow and esoteric blog posts. When an explanation doesn’t land, they’ll reframe until one does. And because they’re confidently wrong often enough, learning to spot their errors becomes part of the curriculum.

For experienced engineers, they’re tireless boilerplate generators, dynamic linters, and a fresh set of eyes at 2am when no one else is around to ask. They handle the mechanical work so you can focus on the interesting problems.

The caveat for both: intentionality matters. They reward users who know what they’re looking for and punish those who outsource judgment entirely.

Interesting tension between craft and speed with LLMs. I've been building with AI assistance for the past week (terminal clients, automation infrastructure) and found the key is: use AI for scaffolding and boilerplate, but hand-refine anything customer-facing or complex. The 'intellectual fly open' problem is real when you just ship AI output directly. But AI + human refinement can actually enable better craft by handling the tedious parts. Not either/or, but knowing which parts deserve human attention vs which can be delegated.
  • keyle
  • ·
  • 19 hours ago
  • ·
  • [ - ]
> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

This gives me somewhat of a knee jerk reaction.

When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.

There was this long argument that you should know things and not have to look it up all the time. Altavista was a joke, and Google was cheating.

Then syntax highlighting came around and there'd always be a guy going "yeah nah, you shouldn't need syntax highlighting to program, you screen looks like a Christmas tree".

Then we got stuff like auto-complete, and it was amazing, the amount of keystrokes we saved. That too, was seen as heresy by the purists (followed later by LSP - which many today call heresy).

That reminds me also, back in the day, people would have entire Encyclopaedia on DVDs collections. Did they use it? No. But they criticised Wikipedia for being inferior. Look at today, though.

Same thing with LLMs. Whether you use them as a powerful context based auto-complete, as a research tool faster than wikipedia and google, as rubber-duck debugger, or as a text generator -- who cares: this is today, stop talking like a fossil.

It's 2025 and junior developers can't work without LSP and LLM? It's fine. They're not in front of a 386 DX33 with 1 book of K&R C and a blue EDIT screen. They have massive challenged ahead of them, the IT world is complete shambles, and it's impossible to decipher how anything is made, even open source.

Today is today. Use all the tools at hand. Don't shame kids for using the best tools.

We should be talking about sustainability of such tools rather than what it means to use them (cf. enshittification, open source models etc.)

  • sifar
  • ·
  • 19 hours ago
  • ·
  • [ - ]
It is not clear though, which tools enable and which tools inhibit your development at the beginning of your journey.
  • keyle
  • ·
  • 19 hours ago
  • ·
  • [ - ]
Agreed, although LLMs definitely qualify as enabling developers compared to <social media, Steam, consoles, and other distractions> of today.

The Internet itself is full of distractions. My younger self spent a crazy amount of time on IRC. So it's not different than spending time on say, Discord today.

LLMs have pretty much a direct relationship with Google. The quality of the response has much to do with the quality of the prompt. If anything, it's the overwhelming nature of LLMs that might be the problem. Back in the day, if you had, say a library access, the problem was knowing what to look for. Discoverability with LLMs is exponential.

As for LLM as auto-complete, there is an argument to be made that typing a lot reinforces knowledge in the human brain like writing. This is getting lost, but with productivity gains.

  • girvo
  • ·
  • 19 hours ago
  • ·
  • [ - ]
Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.

Tools like Claude code with ask/plan mode seem to be better in my experience, though I absolutely do wonder about the lack of typing causing a lack of memory formation

A rule I set myself a long time ago was to never copy paste code from stack overflow or similar websites. I always typed it out again. Slower, but I swear it built the comprehension I have today.

I spent the first two years or so of my coding career writing PHP in notepad++ and only after that switched to an IDE. I rarely needed to consult the documentation on most of the weird quirks of the language because I'd memorized them.

Nowadays I'm back to a text editor rather than an IDE, though fortunately one with much more creature comforts than n++ at least.

I'm glad I went down that path, though I can't say I'd really recommend as things felt a bit simpler back then.

  • keyle
  • ·
  • 17 hours ago
  • ·
  • [ - ]
> Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.

That's not an LLM problem, they'd do the same thing 10 years ago with stack overflow: argue about which answer is best, or trust the answer blindly.

  • girvo
  • ·
  • 15 hours ago
  • ·
  • [ - ]
No, it is qualitatively different because it happens in-line and much faster. If it’s not correct (which it seems it usually isn’t), they spend more time removing whatever garbage it autocompleted.
People do it with the autocomplete as well so I guess there's not that much of a difference wrt LLMs. It likely depends on the language but people who are inexperienced in C++ would be over-relying on autocomplete to the point that it looks hilarious, if you have a chance to sit next to them helping to debug something for example.
  • girvo
  • ·
  • 43 minutes ago
  • ·
  • [ - ]
For sure, but these new tools spit out a lot more and a lot faster, and it’s usually correct “enough” that the compiler won’t yell. It’s been wild to see its suggestions be wrong far more often than they are right, so I wonder how useful they really are at all.

Normal auto complete plus a code tool like Claude Code or similar seem far more useful to me.

> never copy paste code from stack overflow

I have the same policy. I do the same thing for example code in the official documentation. I also put in a comment linking to the source if I end up using it. For me, it’s like the RFD says, it’s about taking responsibility for your output. Whether you originated it or not, you’re the reason it’s in the codebase now.

> but I swear it built the comprehension I have today.

For interns/junior engineers, the choice is: comprehension VS career.

And I won't be surprised if most of them will go with career now, and comprehension.. well thanks maybe tomorrow (or never).

I have worked with a lot of junior engineers, and I’ll take comprehension any day. Developing their comprehension is a huge part of my responsibility to them and to the company. It’s pretty wasteful to take a human being with a functioning brain and ask them to churn out half understood code that works accidentally. I’m going to have to fix that eventually anyway, so why not get ahead of it and have them understand it so they can fix it instead of me?
I don’t think that’s the dichotomy. I’ve been in charge of hiring at a few companies, and comprehension is what I look for 10 times out of 10.
There are plenty of companies today where "not using AI enough" is a career problem.

It shouldn't be, but it is.

well you could get "interview-optimized" interviewees with impressive-looking mini-projects
[dead]
LLMs are in a context where they are the promised solution for most of the expected economic growth on one end, a tool to improve programmer productivity and skill while also being only better than doom scrolling?

Thats comparison undermines the integrity of the argument you are trying to make.

>"in my days, we had books and we remembered things" which of course is hilarious

it isn't hilarious, it's true. My father (now in his 60s) who came from a blue collar background with very little education taught himself programming by manually copying and editing software out of magazines, like a lot of people his age.

I teach students now who have access to all the information in the world but a lot of them are quite literally so scatterbrained and heedless anything that isn't catered to them they can't process. Not having working focus and memory is like having muscle atrophy of the mind, you just turn into a vegetable. Professors across disciplines have seen decline in student abilities, and for several decades now, not just due to LLMs.

Information 30 years ago was more difficult to obtain. It required manual labor but in todays' context there was not much information to be consumed. Today, we have the opposite - a huge vast of information that is easy to obtain but to process? Not so much. Decline is unavoidable. Human intelligence isn't increasing at the pace advancements are made.
  • pjmlp
  • ·
  • 14 hours ago
  • ·
  • [ - ]
Ah, but lets do leetcode on the whiteboard as interview, for an re-balancing a red-black tree, regardless of how long those people have been in the industry and the job position they are actually applying for.
> "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer

Reading books was never about knowledge. It was about knowhow. You didn't need to read all the books. Just some. I don't know how many developers I met who would keep asking questions that would be obvious to anyone who had read the book. They never got the big picture and just wasted everyone's time, including their own.

"To know everything, you must first know one thing."

> When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.

But I mean, you can get by without memorizing stuff sure, but memorizing stuff does work out your brain and does help out in the long run? Isn't it possible we've reached the cliff of "helpful" tools to the point we are atrophying enough to be worse at our jobs?

Like, reading is surely better for the brain than watching TV. But constant cable TV wasn't enough to ruin our brains. What if we've got to the point it finally is enough?

I'm sure I'm biased by my age (mid 40s) but I think you are onto something there. What if this constant decline in how people learn (on average) is not just a grumpy old man feeling? What if it's something real, that was smoothened out by the sheer increase of the student population between 1960 and 2010 and the improvements of tooling?
As usual with Oxide's RFDs, I found myself vigorously head-nodding while reading. Somewhat rarely, I found a part that I found myself disagreeing with:

> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.

Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.

Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".

I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.

Writing is an expression of an individual, while code is a tool used to solve a problem or achieve a purpose.

The more examples of different types of problems being solved in similar ways present in an LLM's dataset, the better it gets at solving problems. Generally speaking, if it's a solution that works well, it gets used a lot, so "good solutions" become well represented in the dataset.

Human expression, however, is diverse by definition. The expression of the human experience is the expression of a data point on a statistical field with standard deviations the size of chasms. An expression of the mean (which is what an LLM does) goes against why we care about human expression in the first place. "Interesting" is a value closely paired with "different".

We value diversity of thought in expression, but we value efficiency of problem solving for code.

There is definitely an argument to be made that LLM usage fundamentally restrains an individual from solving unsolved problems. It also doesn't consider the question of "where do we get more data from".

>the code you actually want to ship is so far from what LLMs write

I think this is a fairly common consensus, and my understanding is the reason for this issue is limited context window.

I argue that the intent of an engineer is contained coherently across the code of a project. I have yet to get an LLM to pick up on the deeper idioms present in a codebase that help constrain the overall solution towards these more particular patterns. I’m not talking about syntax or style, either. I’m talking about e.g. semantic connections within an object graph, understanding what sort of things belong in the data layer based on how it is intended to be read/written, etc. Even when I point it at a file and say, “Use the patterns you see there, with these small differences and a different target type,” I find that LLMs struggle. Until they can clear that hurdle without requiring me to restructure my entire engineering org they will remain as fancy code completion suggestions, hobby project accelerators, and not much else.
Very well stated.
One difference is that clichéd prose is bad and clichéd code is generally good.
Depends on what your prose is for. If it's for documentation, then prose which matches the expected tone and form of other similar docs would be clichéd in this perspective. I think this is a really good use of LLMs - making docs consistent across a large library / codebase.
I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.
  • girvo
  • ·
  • 18 hours ago
  • ·
  • [ - ]
The “change capture”/straight jacket style tests LLMs like to output drive me nuts. But humans write those all the time too so I shouldn’t be that surprised either!
What do these look like?

  1. Take every single function, even private ones.
  2. Mock every argument and collaborator.
  3. Call the function.
  4. Assert the mocks were  called in the expected way.
These tests help you find inadvertent changes, yes, but they also create constant noise about changes you intend.
  • girvo
  • ·
  • 45 minutes ago
  • ·
  • [ - ]
You beat me to it, and yep these are exactly it.

“Mock the world then test your mocks”, I’m simply not convinced these have any value at all after my nearly two decades of doing this professionally

These tests also break encapsulation in many cases because they're not testing the interface contract, they're testing the implementation.
Juniors on one of the teams I work with only write this kind of tests. It’s tiring, and I have to tell them to test the behaviour, not the implementation. And yet every time they do the same thing. Or rather their AI IDE spits these out.
If the goal is to document the code and it gets sidetracked and focuses on only certain parts it failed the test. It just further proves llm's are incapable of grasping meaning and context.
A problem I’ve found with LLMs for docs is that they are like ten times too wordy. They want to document every path and edge case rather focusing on what really matters.

It can be addressed with prompting, but you have to fight this constantly.

  • pxc
  • ·
  • 8 hours ago
  • ·
  • [ - ]
> A problem I’ve found with LLMs for docs is that they are like ten times too wordy

This is one of the problems I feel with LLM-generated code, as well. It's almost always between 5x and long and 20x (!) as long as it needs to be. Though in the case of code verbosity, it's usually not because of thoroughness so much as extremely bad style.

I think probably my most common prompt is "Make it shorter. No more than ($x) (words|sentences|paragraphs)."
  • pxc
  • ·
  • 7 hours ago
  • ·
  • [ - ]
I've never been able to get that to work. LLMs can't count; they don't actually know how long their output is.
  • dcre
  • ·
  • 20 hours ago
  • ·
  • [ - ]
Docs also often don’t have anyone’s name on them, in which case they’re already attributed to an unknown composite author.
I guess to follow up slightly more:

- I think the "if you use another model" rebuttal is becoming like the No True Scotsman of the LLM world. We can get concrete and discuss a specific model if need be.

- If the use case is "generate this function body for me", I agree that that's a pretty good use case. I've specifically seen problematic behavior for the other ways I'm seeing it OFTEN used, which is "write this feature for me", or trying to one shot too much functionality, where the LLM gets to touch data structures, abstractions, interface boundaries, etc.

- To analogize it to writing: They shouldn't/cannot write the whole book, they shouldn't/cannot write the table of contents, they cannot write a chapter, IMO even a paragraph is too much -- but if you write the first sentence and the last sentence of a paragraph, I think the interpolation can be a pretty reasonable starting point. Bringing it back to code for me means: function bodies are OK. Everything else gets questionable fast IME.

My suspicion is that this is a form of the paradox where you can recognize that the news being reported is wrong when it is on a subject in which you are an expert but then you move onto the next article on a different subject and your trust resumes.

Basically if you are a software engineer you can very easily judge quality of code. But if you aren’t a writer then maybe it is hard for you to judge the quality of a piece of prose.

> I think that the code you actually want to ship is so far from what LLMs write

It depends on the LLM, I think. A lot of people have a bad impression of them as a result of using cheap or outdated LLMs.

There are cases where I would start the coding process by copy-pasting existing code (e.g. test suites, new screens in the UI) and this is where LLMs work especially well and produce code that is majority of the time production-ready as-is.

A common prompt I use is approximately ”Write tests for file X, look at Y on how to setup mocks.”

This is probably not ”de novo” and in terms of writing is maybe closer to something like updating a case study powerpoint with the current customer’s data.

  • themk
  • ·
  • 18 hours ago
  • ·
  • [ - ]
I recently published an internal memo which covered the same point, but I included code. I feel like you still have a "voice" in code, and it provides important cues to the reviewer. I also consider review to be an important learning and collaboration moment, which becomes difficult with LLM code.
  • dcre
  • ·
  • 20 hours ago
  • ·
  • [ - ]
In my experience, LLMs have been quite capable of producing code I am satisfied with (though of course it depends on the context — I have much lower standards for one-off tools than long-lived apps). They are able to follow conventions already present in a codebase and produce something passable. Whereas with writing prose, I am almost never happy with the feel of what an LLM produces (worth noting that Sonnet and Opus 4.5’s prose may be moving up from disgusting to tolerable). I think of it as prose being higher-dimensional — for a given goal, often the way to express it in code is pretty obvious, and many developers would do essentially the same thing. Not so for prose.
try Opus 4.5, you'll be surprised. It might be true for past versions of LLMs, but they advanced a lot.
> LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation!)

That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.

I am really surprised that people are surprised by this, and honestly the reference was so casual in the RFD because it's probably the way that I use LLMs the most (so very much coming from my own personal experience). I will add a footnote to the RFD to explain this, but just for everyone's benefit here: at Oxide, we have a very writing-intensive hiring process.[0] Unsurprisingly, over the last six months, we have seen an explosion of LLM-authored materials (especially for our technical positions). We have told applicants to be careful about doing this[1], but they do it anyway. We have also seen this coupled with outright fraud (though less frequently). Speaking personally, I spend a lot of time reviewing candidate materials, and my ear has become very sensitive to LLM-generated materials. So while I generally only engage an LLM to aid in detection when I already have a suspicion, they have proven adept. (I also elaborated on this a little in our podcast episode with Ben Shindel on using LLMs to explore the fraud of Aidan Toner-Rodgers.[2])

I wasn't trying to assert that LLMs can find all LLM-generated content (which feels tautologically impossible?), just that they are useful for the kind of LLM-generated content that we seek to detect.

[0] https://rfd.shared.oxide.computer/rfd/0003

[1] https://oxide.computer/careers

[2] https://oxide-and-friends.transistor.fm/episodes/ai-material...

I debated not writing this, as I planned on re-applying again, as oxide is in many ways a dream company for me, and didn't want this to hurt my chances if I could be identified and it was seen as negative or critical (I hope not, I'm just relaying my experience, as honestly as I can!), but I felt like I needed to make this post (my first on HN, a longtime lurkerj). I applied in the last 6 months, and against my better judgement, encouraged by the perceived company culture, the various luminaries on the team, the varied technical and non-technical content on the podcasts, and my general (unfortunate) propensity for honesty, I was more vulnerable than normal in a tech application, and spent many hours writing it. (fwiw, it's not super relevant to what I'll get to, but you can and should assume I am a longtime Rust programmer (since 1.0) with successful open source libraries, even ones used by oxide, but also a very private person, no socials, no blogging, etc., so much to my chagrin, I assumed I would be a shoe-in :)) After almost 3 months, I was disappointed (and surprised if I'm being honest, hubris, indeed!) to receive a very bland, uninformative rejection email for the position, stating they received too many applications for the position (still not filled as of today!) and would not proceed at this time, and welcome to re-apply, etc. Let me state: this is fine, this is not my first rodeo! I have a well paying (taking the job would have been a significant paycut, but that's how much I wanted to work there!), albeit at the moment, unchallenging job at a large tech company. What I found particularly objectionable was that my writing samples (urls to my personal samples) were never accessed.

This is or could be signal for a number of things, but what was particularly disappointing was the heavy emphasis on writing in the application packet and the company culture, as e.g., reiterated by the founder I'm replying to, and yet my writing samples were never even read? I have been in tech for many years, seen all the bullshit in recruiting, hiring, performed interviews many times myself, so it wouldn't be altogether surprising that a first line recruiter throws a resume into a reject pile for <insert reasons>, but then I have so many other questions - why the 3 months delay if tossed quickly, and if it truly was read by the/a founder or heavily scrutinized, as somewhat indicated by the post, why did they not access my writing samples? There are just more questions now. All of this was bothersome, and if I'm being honest, made me question joining the company, but what really made me write this response, is that I am now worried, given the content of the post I'm replying to, whether my application was flagged as LLM generated? I don't think my writing style is particularly LLMish, but in case that's in doubt, believe me or not, my application, and this response does not have a single word from an LLM. This is all, sui generis, me, myself, and I. (This doesn't quite explain why my samples weren't accessed, but if I'm being charitable, perhaps the content of the application packet seemed of dubious provenance?) Irregardless, if it was flagged, I suppose the long and short of this little story is: are you sending applicants rejection letters noting this suspicion, at least as a courtesy? If I was the victim of a false positive, I would at least like to know. This isn't some last ditch attempt (the rejection was many months ago) to get re-eval'd; I have a job, I can reapply in my own time, and even if this was an oversight or mistake (although not accessing the writing samples at all is somewhat of a red flag for me), there is no way they can contact me through this burner account, it's just, like, the principle of it, and the words needed to be said :) Thank you, and PS, even through it all, I (perhaps now guiltily) still love your podcast :D

I mean this nicely: please don't prostrate yourself for these companies. Please have some more respect for yourself.
I thought about it - a quick way to verify whether something was created with LLM is to feed an LLM half of the text and then let it complete token by token. Every completion, check not just for the next token but the next n-probable tokens. If one of them is the one you have in the text, pick it and continue. This way, I think, you can identify how much the model is "correct" by predicting the text it hasn't yet seen.

I didn't test it and I'm far from an expert, maybe someone can challenge it?

That seems somewhat similar to perplexity based detection, although you can just get the probabilities of each token instead of picking n-best, and you don't have to generate.

It kinda works, but is not very reliable and is quite sensitive to which model the text was generated with.

This page has nice explanations:

https://www.pangram.com/blog/why-perplexity-and-burstiness-f...

I expect that, for values of n for which this test consistently reports "LLM-generated" on LLM-generated inputs, it will also consistently report "LLM-generated" on human-generated inputs. But I haven't done the test either so I could be wrong.
I would be surprised they have any data about this. There are so many ways LLMs can be involved, from writing everything, to making text more concise or just "simple proofreading". Detecting all this with certainty is not trivial and probably not possible with the current tools we have.
> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.

My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:

1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project

2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.

3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.

4) I then tell it to generate the code

5) I skim & test the code to see if it's generally correct, and have it make corrections as needed

6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)

The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.

This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.

I've found that your step 6 takes the vast majority of the time I spend programming with LLMs. Like 10X+ the combined total of time steps 1-5 take. And that's if the code the LLM produced actually works. If it doesn't work (which happens quite often), then even more handholding and corrections are needed. It's really a grind. I'm still not sure whether I am net saving time using these tools.

I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?

You can have the tool start by writing an implementation plan describing the overall approach and key details including references, snippets of code, task list, etc. That is much faster than a raw diff to review and refine to make sure it matches your intent. Once that's acceptable the changes are quick, and having the machine do a few rounds of refinement to make sure the diff vs HEAD matches the plan helps iron out some of the easy issues before human eyes show up. The final review is then easier because you are only checking for smaller issues and consistency with the plan that you already signed off on.

It's not magic though, this still takes some time to do.

I exclusively use the autocomplete in cursor. I hate reviewing huge chunks of llm code at one time. With the autocomplete, I’m in full control of the larger design and am able to quickly review each piece of llm code. Very often it generates what I was going to type myself.

Anything that involves math or complicated conditions I take extra time on.

I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence

This is my preferred way as well. And when you think about it, it makes sense. With advanced autocomplete you are:

1. Keeping the context very small 2. Keeping the scope of the output very small

With the added benefit of keeping you in the flow state (and in my experience making it more enjoyable).

To anyone that even hates LLMs give autocomplete a shot (with a keying to toggle it if it annoys you, sometimes it’s awful). It’s really no different than typing it manually wrt quality etc, so the speed up isn’t huge, but it feels a lot nicer.

Maybe it subjectively feels like 2-3x faster but in studies that measure it we tend to see smaller improvements like in the range of 20-30% faster. It could be that you are an outlier, of course.
2-3x faster on getting the code written. Fully completing a coding task maybe only 20-30% faster, if we count chasing down requirements, reviews, waiting for CI to pass so I can merge etc.
If it's stuff I have have been doing for years and isn't terribly complex I've found its generally quick to skim review. I don't need to read every line I can glance at it, know it's a loop and why, a function call or whatever. If I see something unusual I take that as an opportunity to learn.

I've seen LLMs write some really bad code a few times lately it seems almost worse than what they were doing 6 or 8 months ago. Could be my imagination but it seems that way.

  • qudat
  • ·
  • 8 hours ago
  • ·
  • [ - ]
Insert before 4: make it generate tests that fail, review, then have it implement and make sure the tests pass.

Insert before that: have it creates tasks with beads and force it to let you review before marking a task complete

Don’t make manual corrections.

If you keep all edits to be driven by the LLM, you can use that knowledge later in the session or ask your model to commit the guidelines to long term memory.

The best way to get an LLM to follow style is to make sure that this style is evident in the codebase. Excessive instructions (whether through memories or AGENT.md) do not help as much.

Personally, I absolutely hate instructing agents to make corrections. It's like pushing a wet noodle. If there is lots to correct, fix one or two cases manually and tell the LLM to follow that pattern.

https://www.humanlayer.dev/blog/writing-a-good-claude-md

How the heck it does not upset your engineering pride and integrity, to limit your own contribution to verifying and touching up machine slop, is beyond me.

You obviously cannot emotionally identify with the code you produce this way; the ownership you might feel towards such code is nowhere near what meticulously hand-written code elicits.

  • csb6
  • ·
  • 16 hours ago
  • ·
  • [ - ]
Strange to see no mention of potential copyright violations found in LLM-generated code (e.g. LLMs reproducing code from Github verbatim without respecting the license). I would think that would be a pretty important consideration for any software development company, especially one that produces so much free software.
Also since LLM generated content is not copyrightable what happens to code you publish as Copyleft license? The entire copyleft system is based on the idea of a human holding copyright to copyleft code. Is a big chunk of it, the LLM part, basically public domain? How do you ensure theres enough human content to make it copyrightable and hence copyleftable….
> since LLM generated content is not copyrightable

That's not how it works. If you ask an LLM to write Harry Potter and it writes something that is 99% the same as Harry Potter, it isn't magically free of copyright. That would obviously be insane.

The legal system is still figuring out exactly what the rules are here but it seems likely that it's going to be on the LLM user to know if the output is protected by copyright. I imagine AI vendors will develop secondary search thingies to warn you (if they haven't already), and there will probably be some "reasonable belief" defence in the eventual laws.

Either way it definitely isn't as simple as "LLM wrote it so we can ignore copyright".

>it seems likely that it's going to be on the LLM user to know if the out is protected by copyright.

To me, this is what seems more insane! If you've never read Harry Potter, and you ask an LLM to write you a story about a wizard boy, and it outputs 80% Harry Potter - how would you even know?

> there will be probably be some "reasonable belief" defence in eventual laws.

This is probably true, but it's irksome to shift all blame away from the LLM producers, using copy-written data to peddle copy-written output. This simply turns the business into copyright infringement as a service - what incentive would they have to actually build those "secondary search thingies" and build them well?

> it definitely isn't as simple as "LLM wrote it so we can ignore copyright".

Agreed. The copyright system is getting stress tested. It will be interesting to see how our legal systems can adapt to this.

> how would you even know?

The obvious way is by searching the training data for close matches. LLMs need to do that and warn you about it. Of course the problem is they all trained on pirated books and then deleted them...

But either way it's kind of a "your problem" thing. You can't really just say "I invented this great tool and it sometimes lets me violate copyright without realising. You don't mind do you, copyright holders?"

I think the poster is looking at it from the other way: purely machine-generated content is not generally copryrightable, even if it can violate copyright. So it's more a question of can a coplyleft license like GPL actually protect something that's original but primarily LLM generated? Should it do so?

(From what I understand, the amount of human input that's required to make the result copyrightable can be pretty small, perhaps even as little as selecting from multiple options. But this is likely to be quite a gray area.)

Has anything like this worked its way through the courts yet?
Yes, training is considered fair use, and output is non-copyrightable / public domain. With many asterix and footnotes, of course.
Don't see how output being public domain makes sense when they could be outputting copyrighted code.

Shouldn't the right's extend forward and simply require the LLM code to be deleted?

With many asterix and footnotes. One of which being that if it literally output the exact code, of course that would be copyright infringement. Something that greatly resembled but with minor changes would be a gray area.

Those kinds of cases, although they do happen, are exceptional. In a typical output that doesn't not line-for-line resemble a single training input, it is considered a new, but non-copyrightable work.

(I'm not a lawyer)

You should be careful about speaking in absolute terms when talking about copyright.

There is nothing that prevents multiple people from owning copyright to identical works. This is also why copyright infringement is such a mess to litigate.

I'd also be interested in knowing why you think code generated by LLMs can't be copyrighted. That's quite a statement.

There's also the problem with copyright law and different jurisdictions.

First, you have to prove it that it produced the copyrighted code. The question is what copyrighted code is in the first place? Literal copy-paste from source is easy but I think 99% of the time this isn't the case.
Is there current generation LLMs do this? I suppose I mean "do this any more than human developers do".
>> Here's my question: why did the files that you submitted name Mark Shinwell as the author?

> Beats me. AI decided to do so and I didn't question it. I did ask AI to look at the OxCaml implementation in the beginning.

This shows that the problem with AI is philosophical, not practical

...what a remarkable thread.
Right? If this is really true, that some random folk without compiler engineering experience, implemented a completely new feature in ocaml compiler by prompting the LLM to produce the code for him, then I think it really is remarkable.
Oh wow, is that what you got from this?

It seems more like a non experienced guy asked the LLM to implement something and the LLM just output what and experienced guy did before, and it even gave him the credit

Copyright notices and signatures in generative AI output are generally a result of the expectation created by the training data that such things exist, and are generally unrelated to how much the output corresponds to any particular piece of training data, and especially to who exactly produced that work.

(It is, of course, exceptionally lazy to leave such things in if you are using the LLM to assist you with a task, and can cause problems of false attribution. Especially in this case where it seems to have just picked a name of one of the maintainers of the project)

Did you take a look at the code? Given your response I figure you did not because if you did you would see that the code was _not_ cloned but genuinely compiled by the LLM.
It’s one thing for you (yes, you, the user using the tool) to generate code you don’t understand for a side project or one off tool. It’s another thing to expect your code to be upstreamed into a large project and let others take on the maintenance burden, not to mention review code you haven’t even reviewed yourself!

Note: I, myself, am guilty of forking projects, adding some simple feature I need with an LLM quickly because I don’t want to take the time to understand the codebase, and using it personally. I don’t attempt to upstream changes like this and waste maintainers’ time until I actually take the time myself to understand the project, the issue, and the solution.

What are you talking about? It was ridiculously useful debugging feature that nobody in their sanity would block because "added maintenance". MR was rejected purely because of political/social reasons.
The guide is generally very well thought, but I see an issue in this part:

It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.

I find two problems with this:

- there is incoherence there. If LLMs are flawless in reading and summarization, there is no difference with reading the original. And if they aren’t flawless, then that flaw also extends to non social stuff.

- in practice, I haven’t found LLMs so good as reading assistants. I’ve send them to check a linked doc and they’ve just read the index and inferred the context, for example. Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.

There is a significant risk in placing a translation layer between content and reader.

> It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.

I think you got this backwards, because I don't think the RFD said that at all. The point was about a social expectation for writing, not for reading.

This is what I’m referencing:

>using LLMs to assist comprehension should not substitute for actually reading a document where such reading is socially expected.

  • gpm
  • ·
  • 18 hours ago
  • ·
  • [ - ]
> Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.

I would consider this a failure in their tool use capabilities, not their reading ones.

To use them to read things (without relying on their much less reliable tool use) take the thing and put it in the context window yourself.

They still aren't perfect of course, but they are reasonably good.

Three whole books likely exceeds their context window size of course, I'd take this as a sign that they aren't up to a task of that magnitude yet.

>Three whole books likely exceeds their context window size of course

This was not “read all three books”, this was “check these three links with the (known) book synopsis/reviews there” and it made up the third one.

>I would consider this a failure in their tool use capabilities, not their reading ones.

Id give it to you if I got an error message, but the text being enhanced with wrong-but-plausible data is clearly a failure of reliability.

> LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well.

I think this points out a key point.. but I'm not sure the right way to articulate it.

A human-written comment may be worth something, but an LLM-generated is cheap/worthless.

The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".

It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.

I'll give it a shot.

Text, images, art, and music are all methods of expressing our internal ideas to other human beings. Our thoughts are the source, and these methods are how they are expressed. Our true goal in any form of communication is to understand the internal ideas of others.

An LLM expresses itself in all the same ways, but the source doesn't come from an individual - it comes from a giant dataset. This could be considered an expression of the aggregate thoughts of humanity, which is fine in some contexts (like retrieval of ideas and information highly represented in the data/world), but not when presented in a context of expressing the thoughts of an individual.

LLMs express the statistical summation of everyone's thoughts. It presents the mean, when what we're really interested in are the data points a couple standard deviations away from the mean. That's where all the interesting, unique, and thought provoking ideas are. Diversity is a core of the human experience.

---

An interesting paradox is the use of LLMs for translation into a non-native language. LLMs are actively being used to better express an individual's ideas using words better than they can with their limited language proficiency, but for those of us on the receiving end, we interpret the expression to mirror the source and have immediate suspicions on the legitimacy of the individual's thoughts. Which is a little unfortunate for those who just want to express themselves better.

I think more people should read Naur's "programming as theory building".

A comment is an attempt to more fully document the theory the programmer has. Not all theory can be expressed in code. Both code and comment are lossy artefacts that are "projections" of the theory into text.

LLMs currently, I believe, cannot have a theory of the program. But they can definitely perform a useful simulacrum of such. I have not yet seen an LLM generated comment that is truly valuable. Of course, lots of human generated comments are not valuable either. But the ceiling for human comments is much, much higher.

This is something that I feel rather conflicted about, because while I greatly dislike the LLM-slop-style writing that so many people are trying to abuse our attention with, I’ve started noticing that there are a large number of people (varying across “audiences”/communities/platforms”) who don’t really notice it, or at least that whoever is behind the slop is making the “right kind” of slop so that they don’t.

For example, I recently was perusing the /r/SaaS subreddit and could tell that most of the submissions were obviously LLM-generated, but often by telling a story that was meant to spark outrage, resonate with the “audience” (eg being doubted and later proven right), and ultimately conclude by validating them by making the kind of decision they typically would.

I also would never pass this off as anything else, but I’ve been finding it effective to have LLMs write certain kinds of documentation or benchmarks in my repos, just so that they/I/someone else have access to metrics and code snippets that I would otherwise not have time to write myself. I’ve seen non-native English speakers write pretty technically useful/interesting docs and tech articles by translating through LLMs too, though a lot more bad attempts than good (and you might not be able to tell if you can’t speak the language)…

Honestly the lines are starting to blur ever so slightly for me, I’d still not want someone using an LLM to chat with me directly, but if someone who could have an LLM build a simple WASM/interesting game and then write an interesting/informative/useful article about it, or steer it into doing so… I might actually enjoy it. And not because the prompt was good: instructions telling an LLM to go make a game and do a write up don’t help me as much or in the same way as being able to quickly see how well it went and any useful takeaways/tricks/gotchas it uncovered. It would genuinely be giving me valuable information and probably wouldn’t be something I’d speculatively try or run myself.

  • leobg
  • ·
  • 10 hours ago
  • ·
  • [ - ]
> I'd rather read the prompt.

That’s what I think when I see a news headline. What are you writing? Who cares. WHY are you writing it — that is what I want to know.

One thing I’ve noticed is that when writing something I consider insightful or creative with LLMs for autocompletion the machine can’t successfully predict any words in the sentence except maybe the last one.

They seem to be good at either spitting out something very average, or something completely insane. But something genuinely indicative of the spark of intelligence isn’t common at all. I’m happy to know that while my thoughts are likely not original, they are at least not statistically likely.

  • jhhh
  • ·
  • 20 hours ago
  • ·
  • [ - ]
I've had the same thought about 'written' text with an LLM. If you didn't spend time writing it don't expect me to read it. I'm glad he seems to be taking a hard stance on that saying they won't use LLMs to write non-code artifacts. This principle extends to writing code as well to some degree. You shouldn't expect other people to peer review 'your' code which was simply generated because, again, you spent no time making it. You have to be the first reviewer. Whether these cultural norms are held firmly remains to be seen (I don't work there), but I think they represent thoughtful application of emerging technologies.
  • an_ko
  • ·
  • 20 hours ago
  • ·
  • [ - ]
I would have expected at least some consideration of public perception, given the extremely negative opinions many people hold about LLMs being trained on stolen data. Whether it's an ethical issue or a brand hazard depends on your opinions about that, but it's definitely at least one of those currently.
I made the mistake of first reading this as a document intended for all in spite of it being public.

This is a technical document that is useful in illustrating how the guy who gave a talk once that I didn’t understand but was captivated by and is well-respected in his field intends to guide his company’s use of the technology so that other companies and individual programmers may learn from it too.

I don’t think the objective was to take any outright ethical stance, but to provide guidance about something ostensibly used at an employee’s discretion.

He speaks of trust and LLMs breaking that trust. Is this not what you mean, but by another name?

> First, to those who can recognize an LLM’s reveals (an expanding demographic!), it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open. But there are deeper problems: LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well. If the prose is automatically generated, might the ideas be too? The reader can’t be sure — and increasingly, the hallmarks of LLM generation cause readers to turn off (or worse).

> Specifically, we must be careful to not use LLMs in such a way as to undermine the trust that we have in one another

> our writing is an important vessel for building trust — and that trust can be quickly eroded if we are not speaking with our own voice

> it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!)

This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).

That's because embarrassingly bad writing is useless, while embarrassingly bad code can still make the computer do (roughly) the right thing and lets you tick off a Jira ticket. So we end up having way more room for awful code than for awful prose.

Reading good code can be a better way to learn about something than reading prose. Writing code like that takes some real skill and insight, just like writing clear explanations.

Some writing is functional, e.g. a letter notifying someone of some information. For that type of writing even bad quality can achieve its purpose. Indeed probably the majority of words written are for functional reasons.
See: Kernighan's Law

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

https://www.laws-of-software.com/laws/kernighan/

I think people misunderstand this quote. Cleverness in this context is referring to complexity, and generally stems from falling in love with some complex mechanism you dream up to solve a problem rather than challenging yourself to create something simpler and easier to maintain. Bolting together bits of LLM-created code is is far more likely to be “clever” rather than good.
What an amazing quote!
> assurance that the model will not use the document to train future iterations of itself.

believing this in 2025 is really fascinating. this is like believing Meta won’t use info they (i)legally collected about you to serve you ads

> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it.

I think the review by the prompt writer should be at a higher level than another person who reviews the code.

If I know how to do something, it is easier for me to avoid mistakes while doing it. When I'm reviewing it it requires different pathways in my brain. Since there is code out there I'm drawn to that path, and I might not not always spot the problem points. Or code might be written in a way that I don't recognize, but still exhibits the same mistake.

In the past, as a reviewer I used to be able to count on my colleagues' professionalism to be a moat.

The size of the moat is inverse to the amount of LLM generated code in a PR / project. At a certain moment you can no longer guarantee that you stand behind everything.

Combine that with the push to do more faster, with less, meaning we're increasing the amount of tech debt we're taking on.

> Ironically, LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation

Is there any evidence for this?

If anything my experience has been the opposite of this. LLM detection is guesswork for an LLM.
no
> Oxide employees bear responsibility for the artifacts we create, whatever automation we might employ to create them.

Yes, allow the use of LLMs, encourage your employees to use them to move faster by rewarding "performance" regardless of risks, but make sure to place responsibility of failure upon them so that when it happens, the company culture should not be blamed.

Nothing new here. Antirez for once has taken a similar stance on his YouTube video channel which has material on the topic. But it's worthwhile having a document like this publicly available by a company that the tech crowd seems to respect.

<offtopic> The "RFD" here stands for "Reason/Request for Decision" or something else? (Request for Decision doesn't have a nice _ring_ on it tbh). I'm aware of RFCs ofc and the respective status changes (draft, review, accepted, rejected) or ADR (Architectural Decision Record) but have not come across the RFD acronym. Google gave several different answers. </offtopic> </offtopic>

It stands for ‘Request for Discussion’:

https://rfd.shared.oxide.computer/rfd/0001

Thanks.
  • hexo
  • ·
  • 13 hours ago
  • ·
  • [ - ]
"LLMs are amazingly good at writing code" that one was good. I cant stop laughing.
I agree with your sentiment, but I do find it amazing that the underlying techniques of inference can emit code that is as apparently coherent as it is. (This does not imply actual coherence.)
I wrote an entire multiplayer game in XNA that I've tried repeatedly to get LLMs to translate to javascript

it's just utterly hopeless how bad they are at doing it

even if I break it down into parts once you get into the stuff that actually matters i.e. the physics, event handling, and game logic induced by events, it just completely falls apart 100% of the time

I felt this the other day. I wouldn't even consider my example exotic, p2p systems using electron? It just couldn't figure out how to work with YJS correctly.

These things aren't hard if you're familiar with the documentation and have made them before, but what there is is an extreme dearth of information about it compared to web dev tutorials.

> Large language models (LLMs) are an indisputable breakthrough of the last five years

Actually a lot people dispute this and I'm sure the author knows that!

  • tizzy
  • ·
  • 12 hours ago
  • ·
  • [ - ]
The idea that LLMs are amazing at comprehension but we are expected to read original documents seems contradictory to me? I’m also wary of using them as editors and losing the writers voice as that feels heavily prompt dependent and whether or not the writer does a final pass without any LLM. Asking someone else to re-write is losing your voice if you don’t have an opinion on how the re-write turns out
  • ·
  • 9 hours ago
  • ·
  • [ - ]
What is the downside of using them to prototype? to generate throwaway code? What do we lose if we default to that behavior?
  • gpm
  • ·
  • 3 hours ago
  • ·
  • [ - ]
Time wasted on failed prototypes? Understanding that could have been generated by the act of prototyping?

Doesn't mean you shouldn't ever do so, but there are tradeoffs that become obvious as soon as you start attempting it.

  • j2kun
  • ·
  • 3 hours ago
  • ·
  • [ - ]
Read the article, which discusses this already, and maybe respond to that.
> LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it.

To extend that: If the LLM is the author and the responsible engineer is the genuine first reviewer, do you need a second engineer at all?

Typically in my experience one review is enough.

Yeesss this is what I’ve been (semi-sarcastically) thinking about. Historically it’s one author and one reviewer before code gets shipped.

Why introduce a second reviewer and reduce the rumoured velocity gained by LLMs? After all, “it doesn’t matter what wrote the code” right.

I say let her rip. Or as the kids say, code goes brrr.

I disagree. Code review has a social purpose as well as a technical one. It reinforces a shared understanding of the code and requires one person to assure another that the code is ready for review. It develops consensus about design decisions and agreement about what the code is for. With only one person, this is impossible. “Code goes brrr” is a neutral property. It can just as easily take you to the wrong destination as the right one.
More eyes are better, but more importantly code review is also about knowledge dissemination. If only the original author and the LLM saw the code you have a bus factor of 1. If another person reviews the bus factor is closer to 2.
yes, obviously?

anyone who is doing serious enough engineering that they have the rule of "one human writes, one human reviews" wants two humans to actually put careful thought in to a thing, and only one of them is deeply incentivised to just commit the code.

your suggestion means less review and worse incentives.

  • Yeask
  • ·
  • 9 hours ago
  • ·
  • [ - ]
anyone who is doing serious enough engineering is not using LLMS.
I wonder if they would be willing to publish the "LLMs at Oxide" advice, linked in the OP [1], but currently publicly inaccessible.

[1] https://github.com/oxidecomputer/meta/tree/master/engineerin...

Disclaimer: Oxide employee here.

To be honest there's really no secret sauce in there. It's primarily how to get started with agents, when to abandon your context and start anew, and advice on models, monitoring cost, and prompting. This is not to diminish the value of the information as it's good information written by great colleagues. I just wanted to note that most of the information can be obtained from the official AI provider documentation and blog posts from AI boosters like Thorsten Ball.

Thanks.
You're welcome. My colleague published the text for it: https://gist.github.com/david-crespo/5c5eaf36a2d20be8a3013ba...
Cool, thanks again to both of you. :-)
>Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it

By this own article's standards, now there are 2 authors who don't understand what they've produced.

This is exactly what the advice is trying to mitigate. At least as I see it, the responsible engineer (meaning author, not some quality of the engineer) needs to understand the intent of the code they will produce. Then if using an llm, they must take full owners of that code by carefully reviewing it or molding it until it reflects their intent. If at the end of this the “responsible” engineer does not understand the code the advice has not been followed.
Oxide’s approach is interesting because it treats LLMs as a tool inside a much stricter engineering boundary. Makes me wonder how many teams would avoid chaos if they adopted the same discipline.
> LLMs are superlative at reading comprehension, able to process and meaningfully comprehend documents effectively instantly.

I couldn't disagree more. (In fact I'm shocked that Bryan Cantrill uses words like "comprehension" and "meaningfully" in relation to LLMs.)

Summaries provided by ChatGPT, conclusions drawn by it, contain exaggerations and half-truths that are NOT there in the actual original sources, if you bother enough to ask ChatGPT for those, and to read them yourself. If your question is only slightly suggestive, ChatGPT's tuning is all too happy to tilt the summary in your favor; it tells you what you seem to want to hear, based on the phrasing of your prompt. ChatGPT presents, using confident and authoritative language, total falsehoods and deceptive half-truths, after parsing human-written originals, be the latter natural language text, or source code. I now only trust ChatGPT to recommend sources to me, and I read those -- especially the relevant-looking parts -- myself. ChatGPT has been tuned by its masters to be a lying sack of shit.

I've recently asked ChatGPT a factual question: I asked it about the identity of a public figure (an artist) whom I had seen in a video on youtube. ChatGPT answered with "Person X", and even explained why Person X's contribution was so great to the piece of art in question. I knew the answer was wrong, so I retorted only with: "Source?". Then ChatGPT apologized, and did the exact same thing, just with "Person Y"; again explaining why Person Y was so influental in making that piece of art so great. I knew the answer was wrong still, so I again said: "Source?". And at third attempt, ChatGPT finally said "Person Z", with a verifiable reference to a human-written document that identified the artist.

FUCK ChatGPT.

Nobody has yet to explain how an LLM can be better than a well paid human expert.
The not needing to pay it well.
A well paid human expert can find lots of uses of LLMs. I'm still not convinced that humans will ever be totally replaced, and what work will look like is human experts using LLMs as another tool in the toolbox, just like how an engineer would have used a slide rule or mechanical calculator back in the day. The kind of work they're good at doesn't cover the full range of necessary engineering tasks, but they do open up new avenues. For instance, yesterday I was able to get the basic gist of three solutions for a pretty complex task in about an hour. The result of that was me seeing that two of them were unlikely to work for what I'm doing, so that now I can invest actual effort in the third solution.
Tools can make individuals and teams more effective. This is just as true for LLM-based tools as it was for traditional ones.

The question is not whether one (1) LLM can replace one (1) expert.

Rather, it is how much farther an expert can get through better tooling. In my experience, it can be pretty far indeed.

  • ·
  • 12 hours ago
  • ·
  • [ - ]
  • ·
  • 15 hours ago
  • ·
  • [ - ]
I know I'm walking into a den of wolves here and will probably get buried in downvotes, but I have to disagree with the idea that using LLMs for writing breaks some social contract.

If you hand me a financial report, I expect you used Excel or a calculator. I don't feel cheated that you didn't do long division by hand to prove your understanding. Writing is no different. The value isn't in how much you sweated while producing it. The value is in how clear the final output is.

Human communication is lossy. I think X, I write X' (because I'm imperfect), you understand Y. This is where so many misunderstandings and workplace conflicts come from. People overestimate how clear they are. LLMs help reduce that gap. They remove ambiguity, clean up grammar, and strip away the accidental noise that gets in the way of the actual point.

Ultimately, outside of fiction and poetry, writing is data transmission. I don't need to know that the writer struggled with the text. I need to understand the point clearly, quickly, and without friction. Using a tool that delivers that is the highest form of respect for the reader.

I think the main problem is people using the tool badly and not producing concise material. If what they produced was really lean and correct it'd be great, but you grow a bit tired when you have to expend time reviewing and parsing long, winding and straight wrong PRs and messages from _people_ who have not put in the time.
I think often, though, people use LLMs as a substitute for thinking about what they want to express in a clear manner. The result is often a large document which locally looks reasonable and well written but overall doesn't communicate a coherant point because there wasn't one expressed to the LLM to begin with, and even a good human writer can only mind-read so much.
The point made in the article was about social contract, not about efficacy. Basically if you use an llm in such a way that the reader detects the style, you lose the trust of the reader that you as the author rigorously understand what has been written, and the reader loses the incentive pay attention easily.

I would extend the argument further to say it applies to lots of human generated content as well. Especially sales and marketing information which similarly elicit very low trust.

> The value is in how clear the final output is.

Clarity is useless if it's inaccurate.

Excel is deterministic. ChatGPT isn't.

While I understand the point you’re making, the idea that Excel is deterministic is not commonly shared among Excel experts. It’s all fun and games until it guesses that your 10th separator value, “SEP-10”, is a date.
  • mft_
  • ·
  • 11 hours ago
  • ·
  • [ - ]
I’m with you, and further, I’d apply this (with some caveats) to images created by generative AI too.

I’ve come across a lot of people recently online expressing anger and revulsion at any images or artwork that have been created by genAI.

For relatively mundane purposes, like marketing materials, or diagrams, or the sort of images that would anyway be sourced from a low-cost image library, I don’t think there’s an inherent value to the “art”, and don’t see any problem with such things being created via genAI.

Possible consequences:

1) Yes, this will likely lead to loss/shifts in employment, but wasn’t progress ever like this? People have historically reacted strongly against many such shifts when advancing technology threatens some sector, but somehow we always figure it out and move on.

2) For genuine art, I suspect this will in time lead to a greater value being placed in demonstrably human-created originals. Related, there’s probably of money to be made by whoever can create a trusted system somehow capturing proof of human work, in a way that can’t be cheated or faked.

Something only a bad writer would write.
Totally agree. The output is what matters.

At this point, who really cares what the person who sees everything as "AI slop" thinks?

I would rather just interact with Gemini anyway. I don't need to read/listen to the "AI slop hunter" regurgitate their social media feed and NY Times headlines back to me like a bad language model.

  • Yeask
  • ·
  • 9 hours ago
  • ·
  • [ - ]
If the output is what matters by definition using a non deterministic does not sound like a good idea.
> When debugging a vexing problem one has little to lose by using an LLM — but perhaps also little to gain.

This probably doesn't give them enough credit. If you can feed an LLM a list of crash dumps it can do a remarkable job producing both analyses and fixes. And I don't mean just for super obvious crashes. I was most impressed with a deadlock where numerous engineers and tried and failed to understand exactly how to fix it.

After the latest production issue, I have a feeling that opus-4.5 and gpt-5.1-codex-max are perhaps better than me at debugging. Indeed my role was relegated to combing through the logs, finding the abnormal / suspicious ones, and feeding those to the models.
LLMs are good where there is a lot of detail but the answer to be found is simple.

This is sort of the opposite of vibe coding, but LLMs are OK at that too.

> LLMs are good where there is a lot of detail but the answer to be found is simple.

Oooo I like that. Will try and remember that one.

Amusingly, my experience is that the longer an issue takes me to debug the simpler and dumber the fix is. It's tragic really.

  • ·
  • 20 hours ago
  • ·
  • [ - ]
The empathy section is quite interesting
Hmmm, I'm a bit confused of their conclusions (encouraging use) given some of the really damning caveats they point out. A tool they themselves determine to need such careful oversight probably just shouldn't be used near prod at all.
For the same quality and quantity output, if the cost of using LLMs + the cost of careful oversight is less than the cost of not using LLMs then the rational choice is to use them.

Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.

  • ahepp
  • ·
  • 20 hours ago
  • ·
  • [ - ]
It seems like this would be a really interesting field to research. Does AI assisted coding result in fewer bugs, or more bugs, vs an unassisted human?

I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".

My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.

I'm not sure about research, but I've used LLMs for a few things here at Oxide with (what I hope is) appropriate judgment.

I'm currently trying out using Opus 4.5 to take care of a gnarly code reorganization that would take a human most of a week to do -- I spent a day writing a spec (by hand, with some editing advice from Claude Code), having it reviewed as a document for humans by humans, and feeding it into Opus 4.5 on some test cases. It seems to work well. The spec is, of course, in the form of an RFD, which I hope to make public soon.

I like to think of the spec is basically an extremely advanced sed script described in ~1000 English words.

Maybe it's not as necessary with a codebase as well-organized as Oxide's, but I found gemini 3 useful for a refactor of some completely test-free ML research code, recently. I got it to generate a test case which would exercise all the code subject to refactoring, got it to do the refactoring and verify that it leads to exactly the same state, then finally got it to randomize the test inputs and keep repeating the comparison.
  • Yeask
  • ·
  • 9 hours ago
  • ·
  • [ - ]
This companies have trillions and they are not doing that research. Why?
And it doesn't factor seniority/experience. What's good for a senior developer is not necessarily same for a beginner
Medication is littered with warning labels but humans still use it to combat illness. Social media can harm mental health yet people still use it. Pick whatever other example you'd like.

There are things in life that have high risks of harm if misused yet people still use them because there are great benefits when carefully used. Being aware of the risks is the key to using something that can be harmful, safely.

I would think some of their engineers love using LLMs, it would be unfair to them to completely disallow it IMO (even as someone who hates LLMs)
Junior engineers are the usual comparison folks make to LLMs, which is apt as juniors need lots of oversight.
What do you find confusing about the document encouraging use of LLMs?

The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".

The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.

Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.

There’s a lot of code that doesn’t hit prod.
The ultimate conclusion seems to be one that leaves it to personal responsibility - the user of the LLM is responsible for ensuring the LLM has done its job correctly. While this is the ethical conclusion to me, but the “gap” left to personal responsibility is so large that it makes me question how useful everything else in this document really is.

I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.

  • keeda
  • ·
  • 16 hours ago
  • ·
  • [ - ]
Here's the only simple, universal law that should apply:

THOU SHALT OWN THE CODE THAT THOU DOST RENDER.

All other values should flow from that, regardless of whether the code itself is written by you or AI or by your dog. If you look at the values in the article, they make sense even without LLMs in the picture.

The source of workslop is not AI, it's a lack of ownership. This is especially true for Open Source projects, which are seeing a wave of AI slop PR's precisely because the onus of ownership is largely on the maintainers and not the upstart "contributors."

Note also that this does not imply a universal set of values. Different organizations may well have different values for what ownership of code means -- E.g. in the "move fast, break things" era of FaceBook, workslop may have been perfectly fine for Zuck! (I'd bet it may even have hastened the era of "Move fast with stable infrastructure.") But those values must be consistently applied regardless of how the code came to be.

"LLMs can be quite effective writing code de novo."

Maybe for simple braindead tasks you can do yourself anyway.

Try doing it on something actually hard or complex and they get it wrong 100/100 if they don't have adequate training data, and 90/100 if they do.

[dead]
[dead]
Cantrill jumps on every bandwagon. When he assisted in cancelling a Node developer (not a native English speaker) over pronouns he was following the Zeitgeist, now "Broadly speaking, LLM use is encouraged at Oxide."

He is a long way from Sun.

I didn't know about that incident before starting at Oxide, but if I'd known about it, it absolutely would have attracted me. I've written a large amount of technical content and not once in over a decade have I needed to use he/him pronouns in it. Bryan was 100% correct.
  • ·
  • 6 hours ago
  • ·
  • [ - ]
Joyent took funding from Peter Thiel. I have not seen attacks from Cantrill against Thiel for his political opinions, so he just punches down for street cred and goes against those he considers expendable.

What about Oxide? Oxide is funded by Eclipse ventures, which now installed a Trump friendly person:

https://www.reuters.com/business/finance/vc-firm-eclipse-tap...

[dead]
For those interested, here's a take from Bryan after that incident https://bcantrill.dtrace.org/2013/11/30/the-power-of-a-prono...
The change: https://github.com/joyent/libuv/pull/1015/files

> Sorry, not interested in trivial changes like that.

- bnoordhuis

As a not native English speaker, I think the change itself is okay (women will also occasionally use computers), but saying you're not interested in merging it is kinda cringe, for a lack of a better term - do you not realize that people will take issue with this and you're turning a trivial change into a messy discussion? Stop being a nerd and merge the damn changeset, it won't break anything either, read the room. Admittedly, I also view the people arguing in the thread to be similarly cringe, purely on the basis that if someone is uninterested/opposed to stuff like this, you are exceedingly unlikely to be able to make them care.

Feels the same as how allowlist/denylist reads more cleanly, as well as main for a branch name uses a very common word as well - as long as updating your CI config isn't too much work. To show a bit of empathy the other way as well, maybe people get tired of too many changes like that (e.g. if most of the stuff you review is just people poking the docs by rewording stuff to be able to say that they contributed to project X). Or maybe people love to take principled stances and to argue idk

> ...it’s not the use of the gendered pronoun that’s at issue (that’s just sloppy), but rather the insistence that pronouns should in fact be gendered.

Yeah, odd thing to get so fixated on when the they/them version is more accurate in this circumstance. While I don't cause drama when I see gendered ones (again, most people here have English as a second language), I wouldn't argue with someone a bunch if they wanted to correct the docs or whatever.

Also for those interested, here is Bryan's take on criticism of Sun:

https://landley.net/history/mirror/linux/kissedagirl.html

He wasn't fired or canceled. It is great to see Gen-Xers and Boomers having all the fun in the 1980s and 1990s and then going all prissy on younger people in the 2010s and trying to ruin their careers.

I fully disagree with 1) the stance, 2) the conclusions.
The problem with this text is it's a written anecdote. Could all be fake.
Find it interesting that the section about LLM’s tells when using it for writing is absolutely littered with emdashes
To be fair, LLMs usually use em-dashes correctly, whereas I think this document misuses them more often than not. For example:

> This can be extraordinarily powerful for summarizing documents — or of answering more specific questions of a large document like a datasheet or specification.

That dash shouldn't be there. That's not a parenthetical clause, that's an element in a list separated by "or." You can just remove the dash and the sentence becomes more correct.

I don't know whether that use of the em-dash is grammatically correct, but I've seen enough native English writers use it like that. One example is Philip K Dick.
Perhaps you have—or perhaps you've seen this construction instead, where (despite also using "or") the phrase on the other side of the dash is properly parenthetical and has its own subject.
LLMs also generally don't put spaces around em dashes — but a lot of human writers do.
I think you're thinking of british-style "en-dashes" – which is often used for something that could have been separated by brackets but do have a space either side – rather than "em" dashes. They can also be used in a similar place as a colon – that is to separate two parts of a single sentence.

British users regularly use that sort of construct with "-" hyphens, simply because they're pretty much the same and a whole lot easier to type on a keyboard.

You can stop LLMs from using em-dashes by just telling it to "never use em-dashes". This same type of prompt engineering works to mitigate almost every sign of AI-generated writing, which is one reason why AI writing heuristics/detectors can never be fully reliable.
  • dcre
  • ·
  • 20 hours ago
  • ·
  • [ - ]
This does not work on Bryan, however.
I guess, but if even in you set aside any obvious tells, pretty much all expository writing out of an LLM still reads like pablum without any real conviction or tons of hedges against observed opinions.

"lack of conviction" would be a useful LLM metric.

I ran a test for a potential blog post where I take every indicator of AI writing and tell the LLM "don't do any of these" and resulted in high school AP English quality writing. Which could be considered a lack of conviction level of writing.
I believe Bryan is a well known em dash addict
  • rl3
  • ·
  • 19 hours ago
  • ·
  • [ - ]
>I believe Bryan is a well known em dash addict

I was hoping he'd make the leaderboard, but perhaps the addiction took proper hold in more recent years:

https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

https://news.ycombinator.com/user?id=bcantrill

No doubt his em dashes are legit, of course.

And I mean no disrespect to him for it, it’s just kind of funny
There was a comment recently by HN's most enthusiastic LLM cheerleader, Simon Willison, that I stopped reading almost immediately (before seeing who posted it), because it exuded the slop stench of an LLM: https://news.ycombinator.com/item?id=46011877

However, I was surprised to see that when someone (not me) accused him of using an LLM to write his comment, he flatly denied it: https://news.ycombinator.com/item?id=46011964

Which I guess means (assuming he isn't lying) if you spend too much time interacting with LLMs, you eventually resemble one.

I don't know what to tell you: that really does not read like it was written by a LLM. You were perhaps set off by the very first sentence, which sounds like it was responding to a prompt?
> if you spend too much time interacting with LLMs, you eventually resemble one

Pretty much. I think people who care about reducing their children's exposure to screen time should probably take care to do the same for themselves wrt LLMs.

Based on paragraph length, I would assume that "LLMs as writers" is the most extensive use case.
I disagree with LLM's as Editors. The about of — in the post is crazy.
Funny how the article states that "LLMs can be excellent editors" and then the post repeats all the mistakes that no editor would make:

1. Because reading posts like this 2. Is actually frustrating as hell 3. When everything gets dragged around and filled with useless anecdotes and 3 adjective mumbojumbos and endless emdashes — because somehow it's better than actually just writing something up.

Which just means that people in tech or in general have no understanding what an editor does.

Funnily enough, the text is so distinctively Cantrillian that I have no doubts this is 100% an “organic intelligence” product.
I had trouble getting past the Early Modern English tinge of the language used in this. It’s fun, but it distracts from the comprehension in attempt to just sound epic. It’s fine if you’re writing literature, but it comes off sounding uppity in a practical doc for devs. Writing is not just about conveying something in a mood you wish to set. Study how Richard Feynman and Warren Buffett communicated to their audiences; part of their success is that they speak to their people in the language all can easily understand.
  • dcre
  • ·
  • 5 hours ago
  • ·
  • [ - ]
Feynman at the 1965 Nobel banquet: “Each joy, though transient thrill, repeated in so many places amounts to a considerable sum of human happiness. And, each note of affection released thus one upon another has permitted me to realize a depth of love for my friends and acquaintances, which I had never felt so poignantly before.”

https://www.nobelprize.org/prizes/physics/1965/feynman/speec...

What do you mean? The document seemed incredibly digestible to me.

Are you speaking about words like “shall”? I didn’t notice them, but In RFCs those are technical terms which carry precise meaning.

Here it is, rewritten in accessible English:

Using Large Language Models (LLMs) at Oxide

This document explains how we should think about using LLMs (like ChatGPT or similar tools) at Oxide.

What are LLMs?

LLMs are very advanced computer programs that can understand and generate text. They've become a big deal in the last five years and can change how we work. But, like any powerful tool, they have good and bad sides. They are very flexible, so it’s hard to give strict rules about how to use them. Still, because they are changing so fast, we need to think carefully about when and how we use them at Oxide.

What is Important When Using LLMs

We believe using LLMs should follow our core values:

Responsibility:

We are responsible for the work we produce. Even if we use an LLM to help, a human must make the final decisions. The person using the LLM is responsible for what comes out.

Rigor (Care and Precision):

LLMs can help us think better or find mistakes, but if we use them carelessly, they can cause confusion. We should use them to improve our work, not to cut corners.

Empathy:

Remember, real people read and write what we produce. We should be kind and respectful in our language, whether we are writing ourselves or letting an LLM help.

Teamwork:

We work as a team. Using LLMs should not break trust among team members. If we tell others we used an LLM, it might seem like we’re avoiding responsibility, which can hurt trust.

Urgency (Doing Things Quickly):

LLMs can help us work faster, but we shouldn’t rush so much that we forget responsibility, care, and teamwork. Speed is good, but not at the cost of quality and trust.

How We Use LLMs

LLMs can be used in many ways. Here are some common uses:

1. As Readers

LLMs are great at quickly understanding documents, summaries, or answering questions about texts.

Important: When sharing documents with an LLM, make sure your data is private. Also, remember that uploading files might allow the LLM to learn from your data unless you turn that off.

Note: Use LLMs to help understand documents, but don’t skip reading them yourself. LLMs are tools, not replacements for reading carefully.

2. As Editors

LLMs can give helpful feedback on writing, especially after you’ve written a draft. They can suggest improvements in structure and wording.

Caution: Sometimes, LLMs may flatter your work too much or change your style if used too early. Use them after you’ve done some work yourself.

3. As Writers

LLMs can write text, but their writing can be basic or obvious. Sometimes, they produce text that shows it was made by a machine.

Why be careful? If readers see that the writing is from an LLM, they might think the author didn’t put in enough effort or don’t truly understand the ideas.

Our rule: Usually don’t let LLMs write your final drafts. Use them to help, but own your words and ideas.

4. As Code Reviewers

LLMs can review code and find problems, but they can also miss issues or give bad advice. Use them as a helper, not a replacement for human review.

5. As Debuggers

LLMs can sometimes help find solutions to tricky problems. They might give helpful hints. But don’t rely on them too much—use them as a second opinion.

6. As Programmers

LLMs are very good at writing code, especially simple or experimental code. They can be useful for quick tasks like writing tests or prototypes.

Important: When an LLM writes code, the person responsible must review it carefully. Responsibility for the code stays with the human.

Teamwork: If you use an LLM to generate code, make sure you understand and review it yourself first.

How to Use LLMs Properly

There are detailed guidelines and tips in the internal document called "LLMs at Oxide."

In general:

Using LLMs is encouraged, but always remember your responsibilities—to your product, your customers, and your team.