Packages were supposed to replace programming. They got you 70% of the way there as well.
Same with 4GLs, Visual Coding, CASE tools, even Rails and the rest of the opinionated web tools.
Every generation has to learn “There is no silver bullet”.
Even though Fred Brooks explained why in 1986. There are essential tasks and there are accidental tasks. The tools really only help with the accidental tasks.
AI is a fabulous tool that is way more flexible than previous attempts because I can just talk to it in English and it covers every accidental issue you can imagine. But it can’t do the essential work of complexity management for the same reason it can’t prove an unproven maths problem.
As it stands we still need human brains to do those things.
AI is great at getting you started or setting up scaffolds that are common to all tasks of a similar kind. Essentially anything with an identifiable pattern. It’s yet another abstraction layer sitting on an abstraction layer.
I suspect this is the reason we are really only seeing AI agents being used in call centers, essentially providing stand ins for chatbots- because chatbots are designed to automate highly repetitive, predictable tasks like changing an address or initiating a dispute. But for things like “I have a question about why I was charged $24.38 on my last statement” you will still be escalated to an agent because inquiries like that require a human to investigate and interpret an unpredictable pattern.
But creative tasks are designed to model the real world which is inherently analog and ever changing and closing that gap of identifying what’s missing between what you have and the real world and coming up with creative solutions is what humans excel at.
Self driving, writing emails, generating applications- AI gets you a decent starting point. It doesn’t solve problems fully, even with extensive training. Being able to fill that gap is true AI imo and probably still quite a ways off.
Wishful thinking? You'll just get kicked out of the chat because all the agents have been fired.
You know what is even cheaper, more scalable, more efficient, and more user-friendly than a chatbot for those use cases?
A run of the mill form on a web page. Oh, and it's also more reliable.
> A run of the mill form on a web page. Oh, and it's also more reliable.
Web-accessible forms are great for asynchronous communication and queries but are not as effective in situations where the reporter doesn't have a firm grasp on the problem domain.
For example, a user may know printing does not work but may be unable to determine if the issue is caused by networking, drivers, firmware, printing hardware, etc.
A decision tree built from the combinations of even a few models of printer and their supported computers could be massive.
In such cases, hiring people might be more effective, efficient, and scalable than creating and maintaining a web form.
Hum... Your point is that LLMs are more effective?
Because, of course people are, but that's not the point. Oh, and if you do create that decision tree, do you know how you communicate it better than with a chatbot? You do that by writing it down, as static text, with text-anchors on each step.
Are they?
If the LLMs could talk to grandma for 40 minutes until it figures out what her problem actually is as opposed to what she thinks it is and then transfer her over to a person with the correct context to resolve it, I think that's probably better than most humans in a customer service role. Chatting to grandma being random for an extended amount of time is not something that very many customer service people can put up with day in and day out.
The problem is that companies will use the LLMs to eliminate customer service roles rather than make them better.
Ok, but it can't. If we had superhuman AGI, it would beat people in customer service, yes.
None of these tools hurt, but you still need to comprehend the problem domain and the tools -- not least because you have to validate proposed solutions -- and AI cannot (yet) do that for you. In my experience, generating code is a relatively small part of the process.
https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...
“The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.” — Edsger Dijkstra
The abstraction has never been this leaky or vague — Me
If you can code, and you understand the problem (or are well on your way to understanding it), but you're not familiar with the exact syntax of Go or whatever, then working with AI will save you hundreds of hours.
If you can't code, or do not (yet) understand the problem, AI won't save you. It will probably hurt.
I used Claude recently to refresh my knowledge on the browser history API and it said that it gets cleared when the user navigates to a new page because the “JavaScript context has changed”
I have the experience and know how to verify this stuff, but a new engineer may not, and that would be awful.
Things like these made me cancel all my AI subscriptions and just wait for whatever comes after transformers.
"Of course not," replied Bohr, "but I understand it brings you luck whether you believe it or not."
s/luck/lower negative log likelihood. Besides, a participant can still think about and glean truths from their reflections about a conversation they had which contained only false statements.
That it’s better to be superstitious than not, because maybe it matters?
That was GP's point. Steve Yegge has also talked about this.
Someone who does not have that experience would be much more harmed by the false information LLMs can and do spit out.
Good knowledge of facts, willing to give anything a try, but not much wisdom.
I haven’t seen this demonstrated in gpt-4 or Claude sonnet when asking anything beyond the most extreme basics.
I consistently get subtly wrong answers and whenever I ask “oh okay, so it works like this” I always get “Yes! Exactly. You show a deep understanding of…” even though I was wrong based on the wrong info from the LLM.
Useless for knowledge work beyond RAG, it seems.
Search engines that I need to double check are worse than documentation. It’s why so many of us moved beyond stack overflow. Documentation has gotten so good.
…which sometimes feels like it is more work than just fucking doing it yourself. So yeah. I dunno!
I used GitHub Copilot in a project I started mostly to learn Go. It was amazing. I spent not so much time fiddling around with syntax, and much more time thinking about design.
Could code assistants be used to help actually learn a programming language?
- Absolutely.
Will the majority of people that use an LLM to write a Swift app actually do this?
- Probably not, they'll hammer the LLM until it produces code that hobbles along and call it a day.
Also, LEARNING is aided by being more active, but relying on an LLM inherently encourages you to adopt a significantly more passive behavior (reading rather than writing).
Terraform? Hah. 4o and even o1 both absolutely sucked at it. You could copy & paste the documentation for a resource provider, examples and all, and it would still produce almost unusable code. Which was not at all helpful given I didn’t know the language or its design patterns and best practices at all. Sonnet 3.5 did significantly better but still required a little hand holding. And while I got my cloud architecture up and running now I question if I followed “best practices” at all. (Note: I don’t really care if I did though… I have other more important parts of my project to work on, like the actual product itself).
To me one of the big issues with these LLM’s is they have zero ability to do reflection and explain their “thought process”. And even if they could you cannot trust what it says because it could be spouting off whatever random training data it hovered up or it could be “aligned” to agree with whatever you tell it.
And that is the thing about LLM’s. They are remarkably good bullshitters. They’ll say exactly what you want them to and be right just enough that they fool you into thinking they are something more than an incredibly sophisticated next token generator.
They are both incredibly overrated and underrated at the same time. And it will take us humans a little while to fully map out what they are actually good at and what they only pretend to be good at.
Of course everything needs double-checking but just asking the LLM: "how do I do X" will usually at least output all the names of terrraform resources and most configuration attributes I need to look up.
They are great for any kind of work that requires "magical incantations" as I like to call them.
The software development is absolutely a fractal. In 1960s we were solving the complexity by using high level language that compile to machine code to enable more people write simple code. This has happened again and again and again.
But different generations face different problems, which requires another level of thinking, abstraction, and push both boundaries until we reach the next generation. All of this is not solved by a single solution, but the combination based on basic principles that never changes, and these things, at least for now, only human can do.
For example, the jump in productivity from adding an operating system to a computer is orders of magnitude larger than adding an LLM to a web development process despite the LLM requiring infrastructure that cost tens of billions to create.
It seems that while tools are getting more and more sophisticated, they aren’t really resulting in much greater productivity. It all still seems to be resulting in software that solves the same problems as before. Whereas when html came around it opened up use cases that has never been seen before despite being a very simple abstraction layer by today’s standards.
Perhaps the opportunities are greatest when you are abstracting the layer that the fewest understand when LLMs seem to assume the opposite.
The issue with LLMs is they enshrine the status quo. I don't want ossified crappy software that's hard to work with. Frameworks and libraries should have to fight to justify their existence in the marketplace of ideas. Subverting this mechanism is how you ruin software construction.
Another funny thing is that we are using LLM to replace creative professionals, but the real creativity is from human experience, perception and our connections, which are exactly missing from LLM.
Not a robot to do my art so I can spend more time on dishes, laundry and cleaning.
https://www.physicalintelligence.company/blog/pi0
Software moves faster than hardware
Would you rather write some code, or constantly code-review verbose, sneakily wrong junior level code?
What important problem do you have for which "AI generated art" is the answer?
You seriously want to claim you don't already have enough "content" to waste your free time consuming?
Curation of content is also a problem, but if we can come up with better solutions there, generative AI will absolutely result in more and better content for everyone while enabling a new generation of creators.
AI is more likely to contribute to the 1000s.
There is unlimited content online, that doesn't mean theres 100 movies worth watching in any given year. Maybe not even 10.
Content creation is not the problem
Content curation is
Writing dumb scripts that can call out to sophisticated LLMs to automate parts of processes is utterly game changing. I saved at least 200 hours of mundane work this week and it was trivial.
I think you're also right about LLMs. I think path forward in programming is embracing more formal tools. Incidentally, search for method references is more formal than grepping - and that's probably why people prefer it.
I think this analogy is more apt than you may realize. Just like a fractal, the iterated patterns get repeated on a much smaller scale. The jump to higher-level languages was probably a greater leap then the the rest of software innovation will provide. And with each iterative gain we approach some asymptote, but never get there. And this frustration of never reaching our desired outcome results in ever louder hype cycles.
I get your sentiment, I've been through a few hype cycles as well, but besides learning that history repeats itself, there is no saying how it will repeat itself.
I don't know why this is a bad thing. I don't think projects that you believe are nonsensical shouldn't exist just because of your opinion, especially if they're helping people survive in this world. I'm sure the people working on them don't think they're nonsensical.
Rise of antidepressants imply otherwise.
The more bullshit we stop doing, the more energy there is for the awesome things we could do instead.
The arts have a place in society. Tackling real problems like hunger or health do too, arguably more so - they create the space for society to tolerate, if not enjoy art.
But the down side is we have a huge smear of jobs that either don't really matter or only matter for the smallest of moments that exist in this middle ground. I like to think of a travel agent of yesteryear as the perfect example: someone who makes a professional experience of organising your leisure so you don't have to; using questionable industry deals. This individual does not have your consumer interests at heart, because being nice to you is not where the profit is generally. The only role they actually play is rent seeking.
Efficiency threatens the rent seeking models of right now, but at the same time leads to a Cambrian explosion of new ones.
Yet society and economy keeps going and nobody apart from some academic discussions really cares. I mean companies have 100% incentive to trim fat to raise income yet they only do the least minimum.
It may not be tab-tab-tab all the way, but a whole lot more tabs will sneak in.
Humankind will just develop a lot more software, faster and further into industries.
Availability will create new demand.
I design top-down, component by component, and sometimes the parts don't fit together as I envisioned, so I have to write adapters or - worst case - return to the drawing board. If the AI could predict these mismatches, that would also be helpful.
Unfortunately, I don't think AI is great with only the accidental tasks either.
I don't know this reference, so I have to ask: Was "accidental" supposed to be "incidental"? Because I don't see how "accidental" makes any sense.
Chapter 16 is named "No Silver Bullet—Essence and Accident in Software Engineering."
I'll type out the beginning of the abstract at the beginning of the chapter here:
"All software construction involves essential tasks, the fashioning of the complex conceptual structures that compose the abstract software entity, and accidental tasks, the representation of these abstract entities in programming languages and the mapping of these onto machine languages within space and speed constraints. Most of the big past gains in software productivity have come from removing artificial barriers that have made the accidental tasks inordinately hard, such as severe hardware constraints, awkward programming languages, lack of machine time. How much of what software engineers now do is still devoted to the accidental, as opposed to the essential? Unless it is more than 9/10 of all effort, shrinking all the accidental activities to zero time will not give an order of magnitude improvement."
By accidental, he means "non-fundamental complexity". If you express a simple idea in a complex way, the accidental complexity of what you said will be high, because what you said was complex. But the essential complexity is low, because the idea is simple.
Anniversary edition, p182.
"... let us examine its difficulties. Following Aristotle, I divide them into essence - the difficulties inherent in the nature of the software - and accidents - those difficulties that today attends its production but that are not inherent"
I have almost a dozen viz books. Some written over 50 years ago.
While they impart knowledge, I want the knowledge but also the application. I'm going to go out and paint that bike shed. You can go read Tufte or "Show me the Numbers" but I will show you how to get the results.
Right there is your problem. Read the Mythical Man-Month and Design of Design. They are not long books and it's material that's hard to find elsewhere. Old rat tacit knowledge.
How many of those things were envisioned by futurists or great authors? This AI stuff is the stuff of dreams, and I think it’s unwise to consider it another go around the sun.
Yes, a powerful tool, and as powerful tools go, they can re-shape how things get done, but a tool none the less and therefore we must consider what its limits are, which is all OP is getting at and the current and known near future state suggests we aren’t evolving passed the tool state
This AI stuff? No, not really. The stuff of dreams is an AI that you can talk to and interact infinitely and trust that it doesn’t make mistakes. LLMs ain’t it.
Like, for example, all the big-data stuff we do today was unthinkable 10 years ago, today every mid-sized company has a data team. 15 years ago all data in a single monolithic relational database was the norm, all you needed to know was SQL and some Java/C#/PHP and some HTML to get some data wired up into queries.
LLMs absolutely can "detect intent" and correct buggy code. e.g., "this code appears to be trying to foo a bar, but it has a bug..."
What I personally would like AI to do would be to refactor the program so it would be shorter/clearer, without changing its semantics. Then, I (human) could easily review what it does, whether it conforms to the specification. (For example, rewrite the C program to give exactly the same output, but as a Python code.)
In cases where there is a peculiar difference between the desired semantics and real semantics, this would become apparent as additional complexity in the refactored program. For example, there might be a subtle semantic differences between C and Python library functions. If the refactored program would use a custom reimplementation of C function instead of the Python function, it would indicate that the difference matters for the program semantics, and needs to be somehow further specified, or it can be a bug in one of the implementations.
It’s not either/or of course, and AI can help
But sometimes it takes another leap beyond the current set of test cases
Once you write sufficiently detailed unit tests, the AI writes the implementation.
Mind you, when you treat the implementation as the documentation, it questions what you need testing for?
In an era when UIs become ever more Hieroglyphic(tm), Aesthetical(tm), and Nouveau(tm), "AI" revolutionizing and redefining the whole concept of interacting with computers as "Just speak Human." is a wild breath of fresh air.
It's akin to how everyone can build a shelter, but building a house requires a more specialized knowledge. The cost of the later is training time to understand stuff. The cost of programming is also training time to understand how stuff works and how to manipulate them.
Most people can't use mice or keyboards with speed, touchscreens are marginally better except all the "gestures" are unnatural as hell, and programming is pig latin.
Mice and keyboards and programming languages and all the esoteric ways of communicating with computers came about simply because we couldn't just talk Human to them. Democratizing access to computers is a very good and very productive thing.
Generative AI can be thought as an interface to the tool, but it's been proven that they are unreliable. And as the article outlines, if it can get to 70% of the task, but you don't have the knowledge requires to complete it, that's pretty much the same as 0%. And if you have the knowledge, more often than not you realize that it just go faster on a zigzag instead of the straight route you would have taken with more conventional tools.
that aged like LeCun
I don't think that's really the problem with using LLMs for coding, although it depends on how you define "accidental". I suppose if we take the opposite of "essential" (the core architecture, planned to solve the problem) to be boilerplate (stuff that needs to be done as part of a solution, but doesn't itself really define the solution), then it does apply.
It's interesting/amusing that on the surface a coding assistant is one of the things that LLMs appear better suited for, and they are suited for, as far as boilerplate generation goes (essentially automated stack overflow, and similar-project, cut and pasting)... But, in reality, it is one of the things LLMs are LEAST suited for, given that once you move beyond boilerplate/accidental code, the key skills needed for software design/development are reasoning/planning, as well as experienced-based ("inference time") learning to progress at the craft, which are two of the most fundamental shortcomings of LLMs that no amount of scale can fix.
So, yeah, maybe they can sometimes generate 70% of the code, but it's the easy/boilerplate 70% of the code, not the 30% that defines the architecture of the solution.
Of course it's trendy to call LLMs "AI" at the moment, just as previous GOFAI attempts at AI (e.g. symbolic problem solvers like SOAR, expert systems like CYC) were called "AI" until their limitations became more apparent. You'll know we're one step closer to AI/AGI when LLMs are in the rear view mirror and back to just being called LLMs again!
If you aim for the stars and fail, you'll end up on the moon.
Most of the "you'll never need programmers again!" things have ended up more "cars-showered-with-chunks-of-flaming-HTPB" than "accidentally-land-on-moon", tbh. 4GLs had an anomaly, and now we don't talk about them anymore.
(It's a terrible adage, really. "Oops, the obviously impossible thing didn't happen, but an unrelated good thing did" just doesn't happen that often, and when it does there's rarely a causal relation between A and B.)
AI will only get better.
And AI has proven a lot of unproven maths problems as far back as 2019: https://mathscholar.org/2019/04/google-ai-system-proves-over...
Programming today is literally hundreds of times more productive than in 1950. It doesn’t feel that way because of scope creep, but imagine someone trying to create a modern AAA game using only assembly and nothing else. C didn’t show up until the 70’s, and even Fortran was a late 50’s invention. Go far enough back and people would set toggle switches and insert commands that way no keyboards whatsoever.
Move forward to the 1960’s and people coded on stacks of punch cards and would need to wait for access to a compiler overnight. So just imagine the productivity boost of a text editor and a compiler. I’m not taking an IDE with syntax checks etc, just a simple text editor was a huge step up.
And so forth.
You're missing things like LISP and Forth, which allowed for lot of productivity early on. It usually had a performance cost, though.
And PHP (the programming language) just before that, that was a huge change in "democratising" programming and making it easier, we wouldn't have had the web of the last 20-25 years without PHP.
Just like we did at every earlier stage.
That said, it’s a really nice tool. AI will probably be part of most developer’s toolkits moving forward the way LSP and basic IDE features are.
The comparisons are lacking and are almost at whataboutism level.
The amount of actual 'work' that AI does versus the tools of yesterday are an order of magnitude away
In that view I'd say the productivity boost by LLMs is somewhat disappointing, especially with respect to how amazing they are.
From memory, they took some old java projects, and had some LLM driven "agents" update the codebase to recent java. I don't know java enough to know how "hard" this task is, but asking around I've heard that "analog" tools for this exist, but aren't that good, bork often, are hardcoded and so on.
Amz reported ~70% of code that came out passed code review, presumably the rest had to be tweaked by humans. I don't know if there are any "classical" tools that can do that ootb. So yeah, that's already imrpessive and "available today" so to speak.
Really it's not about just using technology, but how you use it. Lots of adults expected kids with smartphones to be generally good with technology, but that's not what we're witnessing now. It turns out browsing TikTok and Snapchat doesn't teach you much about things like file system, text editing, spreadsheets, skills that you actually need as a typical office worker.
I think this is the "closing the loop" ( https://en.wikipedia.org/wiki/Control_loop#Open-loop_and_clo... ) moment for coding AI.
All pieces are there, we just need to decide to do it. Today's AI are able to produce an increasing tangled mess of code. But it's also able to reorganize the code. It's also capable of writing test code, and assess the quality of the code. It's also capable to make architectural decision.
Today's AI code, is more like a Frankenstein's composition. But with the right prompt OODA loop and quality assessment rigor, it boils down to just having to sort and clean the junk pile faster than you produce it.
Once you have a coherent unified codebase, things get fast quickly, capabilities grows exponentially with the number of lines of code. Think of things like Julia Language or Wolfram Language.
Once you have a well written library or package, you are more than 95% there and you almost don't need AI to do the things you want to do.
> All pieces are there, we just need to decide to do it.
Another silver bullet.
You've got to bite the bullet at one point and make the transition from open-loop to closed-loop. There is a compute cost associated to it, and there is also a tuning cost, so it's not all silver lining.
>Once you have a well written library or package, you are more than 95% there and you almost don't need AI to do the things you want to do.
That's an idealistic view. Packages are leaky abstractions that make assumptions for you. Even stuff like base language libraries - there are plenty of scenarios where people avoid them - they work for 9x% of cases but there are cases where they don't - and this is the most fundamental primitive in a language. Even languages are leaky abstractions with their own assumptions and implications.
And these are the abstractions we had decades of experience writing, across the entire industry, and for fairly fundamental stuff. Expecting that level of quality in higher level layers is just not realistic.
I mean just go look at ERP software (vomit warning) - and that industry is worth billions.
That's a perfect summary, in my opinion. Both junior devs and AI tools tend to write buggy and overly verbose code. In both cases, you have to carefully review their code before merging, which takes time away from all the senior members of the team. But for a dedicated and loyal coworker, I'm willing to sacrifice some of my productivity to help them grow, because I know they'll help me back in the future. But current AI tools cannot learn from feedback. That means with AI, I'll be reviewing the exact same beginner's mistakes every time.
And that means time spent on proofreading AI output is mostly wasted.
I'd say it's more like: every time you start a new conversation with him, it's like his first day on the job.
But also: within the span of one interaction with him, he advances from junior to senior engineer in your specific domain.
Only if your expectation of senior engineers is that they often hallucinate and you'll constantly have to double-check their work
The big advantage to me is it's an unexperienced junior with approximate knowledge of every API and style on the internet. It's a super junior.
Like economists, who have predicted 7 of the last 3 recessions, AI knows 17 out of 11 API calls!
And in the case it's wrong, I will know pretty quickly and can fall back to the old methods.
It's definitely been said before, but the fact that they're calling out these non-existent functions in the first place can tell library devs a lot about what new features could be there to take up the namespace.
TODO.md and FEATURE_TODO.md are also very valuable for keeping on track.
If you stop the need of having juniors you're never going to get more experienced people.
Juniors today can learn exponentially faster with LLMs and don't need seniors as much.
Take me for example, I've been programming for 20 years, been through C, C++, C#, Python, JS, PHP but recently had to learn Angular 18 and Fastapi. Even though I knew JS and Python before hand these frameworks have ways of doing things I'm not used to so I've been fumbling with them for the first 100 hours. However when I finally installed Copilot and had a little faith in it I boosted my productivity 3-4x. Of course it didn't write everything correct, of course it used outdated angular instead of latest (which is why I was so reluctant to ask stuff for it at the start) but it still helped me a lot because it is much easier (for me) to modify some bad/outdated code and get it to where I want it than write it from scratch without the muscle memory of the new framework.
So for me it's been a godsend. I expect for stuff that's not as cutting edge as new framework oddities that appeared in the last 12 months it is even more helpful and % of it being correct would be way higher so for juniors that are doing say Python coding on frameworks that have at least 3-4 years and are stable enough the seniors would need to intervene much much less in correcting the junior.
You are not a junior, you already rely on 20 years of experience.
Last time i did any sort of web development was 20 ago, but i thought to try some C# (touched last time ~10 years ago) + Blazor for an idea i had and it took me a couple of days to feel comfortable and start making stuff. While i haven't written for the web in a very very long time, my experience with other tech helped a lot.
And in the near future the mid/senior level will have no replacements as we've under-hired juniors and therefore don't have a pipeline of 5YOE/10YOE/etc devs who have learned to stop being juniors.
And it happily wrote something. When I proceeded to add an actual spec he happily wrote something reasonable which couldn't work, because it assumed all 'is_something' functions can be used as guard statements. Ah oh.
Which seems to work pretty well, in my experience.
However the current generation of models needs a specific form of training set that is quite different from what a human would produce through direct interaction with the model.
For one it needs many more examples than a human would need. But also the form of the example is different: it must be an example of an acceptable answer. This way a model can compute how far it is from the desired outcome.
Further research in how to efficiently fine tune models will make this gap narrower and perhaps senior devs will be able to efficiently give learnable feedback through their normal course of interaction
probably the more we sacrifice of our own productivity the quicker they gain experience (and seniority) right? the only confusing thing that confused me personally in your statement was that they would have to be loyal. Isn't that something that one can only hope but must be proven over time. Meaning that at the time you trust that they turn out well you have no way of proving that they are "loyal" yet. Loyalty is nigh impossible to request upfront? I mean, ... you have to deserve it. And a lot can also go wrong on the way.
I think this also applies to AI having an early or intermediate senior engineer on your team.
So in effect it would be having less engineers and probably 1 or 2 at best senior engineers and the rest are guiding the AI senior engineer in the codebase.
I didn't need to hire any senior engs for a while for my SaaS and only needed good juniors for 3 months.
Everyone in the future is going have access to senior engineers building projects.
Just the dialog with an AI I find instructive. Sometimes it suggests things I don't know. Often after 1-2-3 mediocre AI solutions I'll code up something that re-uses some AI code but has much better code that I write.
The LLM costs a minute fraction of the cost of employing a human junior developer.
Not only that, but one who is infected with terminal Dunning-Kruger syndrome. Of all the things that LLMs are great at, demonstrating a hopeless case of Dunning-Kruger has to be at the very top.
There is a surprising lack of focus on code reviews as part of that process.
A few months back, I ran into one company (a YC company) that used code reviews as their first technical interview. Review some API code (it was missing validation, error handling, etc.), review some database code (missing indices, bad choices for ID columns, etc.), and more.
I think more companies need to rethink their interview process and focus on code reviews as AI adoption increases.
Firstly there is the double edged sword of AI when learning. The easy path is to use it as a way to shortcut learning, to get the juice without the pressing, skipping the discomfort of not knowing how to do something. But that's obviously skipping the learning too. The discomfort is necessary. On the flip side, if one uses an llm as a mentor who has all the time in the world for you, you can converse with it to get a deeper understanding, to get feedback, to unearth unknown unknowns etc. So there is an opportunity for the wise and motivated to get accelerated learning if they can avoid the temptation of a crutch.
The less tractable problem is hiring. Why does a company hire junior devs? Because there is a certain proportion of work which doesn't take as much experience and would waste more senior developers time. If AI takes away the lower skill tasks previously assigned to juniors, companies will be less inclined to pay for them.
Of course if nobody invests in juniors, where will the mid and senior developers of tomorrow come from? But that's a tragedy of the commons situation, few companies will wish to invest in developers who are likely to move on before they reap the rewards.
Weird times ahead, probably, but we will be fine, mostly.
To add on to this point, there's a huge role of validation tools in the workflow.
If AI written rust code compiles and the test cases pass, it's a huge positive signal for me, because of how strict rust compiler is.
One example I can share is
https://github.com/rusiaaman/color-parser-py
which is a python binding of rust's csscolorparser created by Claude without me touching editor or terminal. I haven't reviewed the code yet, I just ensured that test cases really passed (on github actions), installed the package and started using it directly.
If it's so important to test isinstance(r, int) then you should also have tests for g and b, and likely similar tests for the floats.
Is is really worthwhile to 'Convert back to ints and compare' when you know the expected rgba floats already?
Also, all the checks of "if u8 < 255" will make me not want to use this library with a 10-foot pole. It screams "ai" or "I don't know what I'm doing" so much.
The second is an artefact of a test case failing, which it tried to fix it using this check. Thankfully not a correctness failure, only optimisation issue.
You're right though it's not worth publishing for general public.
Publishing the repo is worth it, because it showcases what the AI can (and cannot) do, and it is not there yet. But as a real package to pypy, indeed less.
Or, JPype uses reflection to generate run-time bindings from Python to Java.
Why does it require AI and all of this infrastructure?
This is a case where LLMs can be good, because they can infer quite a bit about the needed scaffolding, and can generate bindings between languages where no such tool exist. We don't need to write a tool for every combination of languages (or adapt one like SWIG to add a language).
But in the long run, depending on how many bindings we need, the cost of each binding may end up outweighing the upfront cost of a human writing the tool.
I didn't explain myself well enough then. In the specific case of Rust, why doesn't Rust already have a way to tell tools what the API is, and why aren't there already tools which convert that API to bindings for other languages?
Ideally, "import rust_ffi; color_lib = rust_ffi.load_library('csscolorparser')"
We already saw from pcwelder's other responses that this AI generation required human involvement ("me asking it to return a 4 tuple instead of a list").
We can also see style questions, like how is_valid is a static method on the ColorParser class, rather than a module-level function as I would have done, so the API intrinsically cannot be automated.
Which I knew from my own experience in developing Python bindings to C libraries. If you want it to feel like Python, rather than C, you will have to put in the work. (In my case, to add class structure, to hook into Python's garbage collection, to match the iterator protocol, and so on.)
The biggest reason that we don't see formal verification being used more is that it takes some time to formally define the output of an algorithm and significantly more time to convince a theorem prover that your code is correct according to the pre- and post-conditions. If we can get LLMs to work well alongside formal verification tools then I think that would greatly improve their usefulness.
Here's a made up and trivial example of how I imagine this might look:
User: Write me a binary search over "type A_Type is array (Integer range <>) of Integer;"
LLM: Implementing the following:
function Search (A : A_Type; Target : Integer) return Integer with Pre => Sorted (A) and Target in A, Post => A(Search'Result) = Target;
First implementation:
function Search (A : A_Type; Target : Integer) return Integer is
Left : Integer := A'First;
Right : Integer := A'Last;
begin
while Left <= Right loop
declare
Mid : Integer := (Left + Right) / 2;
begin
if A (Mid) = Target then
return Mid;
elsif A (Mid) < Target then
Left := Mid + 1;
elsif A (Mid) > Target then
Right := Mid - 1;
end if;
end;
end loop;
end Search;
Compiler: Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
wrapper.adb:12:36: medium: overflow check might fail, cannot prove lower bound for Left + Right
12 | Mid : Integer := (Left + Right) / 2;
| ~~~~~~^~~~~~~~
reason for check: result of addition must fit in a 32-bits machine integer
wrapper.adb:12:45: info: division check proved
wrapper.adb:14:19: medium: array index check might fail
14 | if A (Mid) = Target then
| ^~~
reason for check: value must be a valid index into the array
LLM: Looks like there's a possible overflow for very large arrays, attempting new solution...Etc. until a valid solution is found and the result is returned to the user.
As an aside, this bug was present in Java for 9 years and would have been caught by formal verification (which admittedly was more difficult at the time): https://research.google/blog/extra-extra-read-all-about-it-n...
The reason is probably that it is too tedious/difficult and you need some rare skills to do it. And mostly the categories of bugs it eliminates are not problematic enough. Either way, the amount of people capable of writing code vastly outnumber the people capable of formally verifying that code. I know a lot of programmers without computer science backgrounds that definitely have never been exposed to any of this. I have been exposed to some of this. But that's 25 years ago. And the persons teaching me that lived out his career in academia without ever working on real code that mattered. A lot of this stuff is rather academic and esoteric.
Of course, LLMs could change this a quite a bit. A lot of programming languages are optimized for humans. Lots of programmers prefer languages that sacrifice correctness for flexibility. E.g. static typing is the simplest form of adding some formal verification to a language and a lot of scripting languages get rid of that because the verification step (aka. compilation) is somewhat tedious and so is having to spell out your intentions. Python is a good example of a language that appeals to people without a lot of formal training in programming. And some languages go the other way and are harder to use and learn because they are more strict. Rust is a good example of that. Great language. But not necessarily easy to learn.
With LLMs, I don't actually need to learn a lot of Rust in order to produce working Rust programs. I just need to be able to understand it at a high level. And I can use the LLM to explain things to me when I don't. Likewise, I imagine I could get an LLM to write detailed specifications for whatever verifiers there are and even make helpful suggestions about which ones to pick. It's not that different from documenting code or writing tests for code. Which are two things I definitely use LLMs for these days.
The point here is that LLMs could compensate for a lack of trained people that can produce formal specifications and produce larger volumes of such specifications. There's probably a lot of value in giving some existing code that treatment. The flip side here is that it's still work and it's competing with other things that people could spend time on.
That Java issue you mentioned is an example of something that wasn't noticed for 9 years; probably because it wasn't that big of a problem. The value of the fix was lowish and so is the value of preventing the problem. A lot of bugs are like that.
Python is easy because it lets you get somewhere because the inputs will roughly be the set of acceptable inputs, so the output will be as expected, and you can tweak as things go (much faster for scripting tasks). But when you need a correct program that needs to satisfies some guaranteed, then this strategy no longer cuts it, and suddenly you need a lot more knowledge.
I don't think LLM would cut it, because it doesn't understand ambiguity and how to chisel it away so only the most essential understanding remains.
Those who seems to get the best results asks for a prototype or framework for how to do something. They don't expect to use the AI generated code, it's purely there as inspiration and something they can poke at to learn about a problem.
Most seems to have a bad experience. Either the LLMs doesn't actually know much, if anything about the subject, and makes up weird stuff. A few colleagues have attempted to use LLMs for generating Terraform, or CloudFormation code, but have given up on making it work. The LLMs they've tried apparently cannot stop making up non-existing resources. SRE related code/problems anecdotally seems to do worse than actual development work, but it feel to like you still need to be a fair good developer to have much benefit from an LLM.
The wall we're hitting may be the LLMs not actually having sufficient data for a large set of problems.
That's what GitHub and sample projects are here for. And the examples would be working ones. No need to train a model for that.
Maybe the fact that I was just a kid made this different, but I guess my point is that just because AI can now write you a code file in 10 seconds, doesn't mean your learning process also got faster. It may still take years to become the developer that writes well-structured code and thinks of edge cases and understands everything that is going on.
When I imagine the young people that will sit down to build their own first thing with the help of AI, I'm really excited knowing that they might actually get a lot further a lot faster than I ever could.
GenAI can get deeper into a solution that consists of well known requirements. Like basic web application construction, api development, data storage, and oauth integration. GenAI can get close to 100%.
If you’re trying to build something that’s never been done before or is very complex, GenAI will only get to 50% and any attempt to continue will put you in a frustrating cycle of failure.
I’m having some further success by asking Claude to build a detailed Linear task list and tackling each task separately. To get this to work, I’ve built a file combining script and attaching these files to a Claude project. So one file might be project-client-src-components.txt and it contains all the files in my react nextjs app under that folder in a single file with full file path headers for each file.
We’ll see how deep I get before it can’t handle the codebase.
But for a really tricky logic problem, accurately explaining it in English to an LLM might be less natural than just writing the code.
50% seems exceeding high for 'never done before'.
Where are these people in real life? A few influencers or wannabes say that on Twitter or LinkedIn, but do you know actual people in real life who say they’re "dramatically more productive with AI"?
Everyone I know or talked to about AI has been very critical and rational, and has roughly the same opinion: AIs for coding (Copilot, Cursor, etc.) are useful, but not that much. They’re mostly convenient for some parts of what constitutes coding.
I've never used electron and I got a small electron project up and running considerably faster than I would have otherwise.
I did some consulting for a project written in Vue and I know React, I got a good summary of the differences in the concepts, how to layout and structure files, etc. I had to modify a PHP project that was hosting Vue and I used chatgpt to find out where to look in the project to point to where I needed to look in the code to make the changes in the code. Just this morning I needed to use git bisect but I couldnt remember the exact syntax, I could have googled it and gone through the verbose documentation, the stackoverflow reply, or a long blog post. Instead, I got exactly what I needed back in seconds. I had to develop a migration plan for another project, I already had a rough idea of what to do, but I asked chatgpt anyway because why not, it takes seconds. It came up with what I had thought of already, some things I didn't need, and some things I hadn't thought of.
And that's just on the coding side, even more on the actual start up side!
The quality of their work has gone down dramatically since they started being "faster", and it needs rewriting way more often than before LLMs existed. But they do claim they are now much faster.
You can move fast and still do things well if you set it up correctly at the start. Specifically with things like automated testing and so on.
I dunno. I responded to a post that had a clear question. Either my reply is on the wrong post, or the text changed.
It has been utterly game-changing and time-saving. I question each new bid I do: Should I assume AI will continue to help this much, and adjust my pricing?
Something like creating an ORM database table class, where all the fields are predictable and it's just a matter of writing out the right fields in the right format.
It's also replaced the really stupid Stack Overflow queries, when Ruby and Python and Javascript have corrupted my brain and I can't remember whether it's bar.lower() or bar.lowercase() or bar.lcase() or lower(bar) or whatever.
Yesterday I asked o1-preview (the "best" reasoning AI on the market) how could I safely execute untrusted JavaScript code submitted by the user.
AI suggested a library called vm2, and gave me fully working code example. It's so good at programming that the code runs without any modifications from me.
However, then I looked up vm2's repository. It turns out to be an outdated project, abandoned due to security issues. The successor is isolated-vm.
The code AI gave me is 100% runnable. Had I not googled it, no amount of unit tests can tell me that vm2 is not the correct solution.
Using it as a smarter autocomplete is where I see a lot of productivity boosts. It replaces snippets, it completes full lines or blocks, and because verifying block likely takes less time than writing it, you can easily get a 100%+ speed up.
It wrote functions to separately generate differential equations for water/air side, and finally combined them into a single state vector derivative for integration. Easy peasy, right?
No. On closer inspection, the heat transfer equations had flipped signs, or were using the wrong temperatures. I'd also have preferred to have used structured arrays for vectors, instead of plain lists/arrays.
However, the framework was there. I had to tweak some equations, prompt the LLM to re-write state vector representations, and there it was!
AI-assisted coding is great for getting a skeleton for a project up. You have to add the meat to the bones yourself.
My feeling is that this stuff is not bottle-necked on model quality but on UX. Chat is not that great of an interface. Copy pasting blobs of text back to an editor seems like it is a bit monkey work. And monkey work should be automated.
With AI interactions now being able to call functions, what we need is deeper integration with the tools we use. Refactor this, rename that. Move that function here. Etc. There's no need for it to imagine these things perfectly it just needs to use the tools that make that happen. IDEs have a large API surface but a machine readable description of that easily fits in a context window.
Recently chat gpt added the ability to connect applications. So, I can jump into a chat, connect Intellij to the chat and ask it a question about code in my open editor. Works great and is better than me just copy pasting that to a chat window. But why can't it make a modification for me? It still requires me to copy text back to the editor and then hope it will work.
Addressing that would be the next logical step. Do it such that I can review what it did and undo any damage. But it could be a huge time saver. And it would also save some tokens. Because a lot of code it generates is just echoing what I already had with only a few lines modification. I want it to modify those lines and not risk hallucinating introducing mistakes into the rest, which is a thing you have to worry about.
The other issue is that iterating on code gets progressively harder as there's more of it and it needs to regenerate more of it at every step. That's a UX problem as well. It stems from the context being an imperfect abstraction of my actual code. Applying a lot of small/simple changes to code would be much easier than re-imagining the entire thing from scratch every time. Most of my conversations the code under discussion diverges from what I have in my editor. At some point continuing the conversation becomes pointless and I just start a new one with the actual code. Which is tedious because now I'm dealing with ground hog day of having to explain the same context again. More monkey work. And if you do it wrong, you have to do it over and over again. It's amazing that it works but also quite tedious.
I agree wholeheartedly and that's why I recommend cursor to the point I'm being called a shill for them. I have no relationship to them, but they've shipped the first product that actually addresses this!
They have a "small model" that takes a suggestion in the chat mode (provided by claude3.5 usually but o1 / 4o also work) and "magic merges" it into your codebase at the click of a button. It feels like such an easy task, but I bet it's not and a lot of tinkering went into it and the small mdoel they use. But the UX results are great. You start a chat, frame the problem, get an answer, hit "apply" and see it go line by line and incorporate the changes into your existing code.
Give it a try.
You might know this already, but if you're using the chatbot interfaces it helps quite a bit to prompt it with something along the lines of "only give me the bits that changed". There is nothing worse than fine-tuning a tiny bit of some code you didn't bother writing yourself only to have the bot give you an entire prompt's worth of code.
Tools like this are alright if your expectations of an IDE are low. E.g. if you are happy using just VS Code or whatever. Unfortunately I'm used to a bit more than that. Jetbrains has been doing some of their AI stuff. But I haven't really looked at it that much.
Don't get me wrong; I actually use VS Code for some stuff. But it's just not a replacement for intellij. Not even close. So, not really looking for IDEs with even less features just so I can have some AI.
But the fact remains that if you did all those things yourself, although much slower and more frustrating at times, you would have learned and remembered more, and would have understood the task, the libraries, and your own program on a deeper level.
Which many times is the actual point of the exercise.
I would not be confident betting a career on any those patterns holding. It is like people hand-optimising their assembly back in the day. At some point the compilers get good enough that the skill is a curio rather than an economic edge.
EDIT : http://www.antipope.org/charlie/blog-static/fiction/accelera...
Especially the first 3 chapters, set in the near future (which is ~today now).
Models have been getting better, at a fast clip. With occasional leaps. For decades.
The fact that we are even talking about model coding limitations greatly surpasses expectations for 2024 from just a few years ago.
Progress in steps & bounds isn’t going to stop short of a global nuclear winter.
We all expect just about every technology to get better eventually, but you may notice some things seem to be taking decades.
Edit: realized I replied as if Nevermark was the source of the first post, so just note that's not the case.
%100 agree, I am testing o1 for some math problems. I asked that to prove convolution of two gaussian is gaussian. It gave me 3 page algebraic solution it is correct but not elegant nor good. I have seen more ingenious solution. These tools are really good at doing something but not good at doing like expert human as they claimed.
The goalposts are moving at the speed of light.
A few years ago if someone told us that you could ask a computer to compose a poem, critique a painting, write a professional 360 performance review based on notes, design a website based on a napkin sketch, prove convolution theorems, ... they would say that's a stretch even for sci-fi.
Now we have a single LLM that can do all of that, at some level of quality. Yet, the solutions are not elegant enough, the code not completely correct, the design is not inspired and the poem is "slop".
Nobody's saying that these aren't fascinating, just that it's not looking like their models are getting significantly better and better as all the hype wants you to believe.
Transformers + huge data set is incredible. But literally we've scraped all the data on the web and made huge sacrifices to our entire society already
There's no thought or reasoning behind anything LLMs generate, it's just a statistical pile of stuff. It's never going to generate anything new. It literally can't.
However, they are really good at highlighting just how many people will believe nonsense stated confidently.
It can't on it's own. But why does it need to? As a tool, the user can provide insight, imagination, soul, or guidance.
And let's be honest, very little in our life, work, entertainment or science is completely new. We all stand on the shoulders of giants, remix existing work and reinterpreting existing work.
If it's the latter, then agreed it doesn't produce anything new but so doesn't most of humanity and it doesn't need to, to be able to be of assistance.
ScreenshotToCode wants me to pay for a subscription, even before I have any idea of its capabilities. V0 keeps throwing an error in the generated code, which the AI tries to remedy, without success after 3 tries. Bolt.New redirects to StackBlitz, and after more than an hour, there are still spinners stuck on trying to import and deploy the generated code.
Sounds like snake oil all around. The days of AI-enabled low-/no-code are still quite a while away, I think, if at all feasible.
To me it's about asking the right questions no matter the level of experience. If you're junior it will help you ramp up at a speed I could no imagine before.
In my case, ChatGPT was quite useful to bootstrap a code for working with Apple's CoreAudio and sucked when I tried to make it write some code to do something special to my app.
I got the same experience when I tried to hammer out an app dealing with cryptography(certificates, keys etc) and was only able to solve the issue through Apple's developer forms thanks to to the infamous eskimo1. Then some Russian dev who had the same issues appeared and gave me the full context. ChatGPT, Claude etc. can't put together a Swift code that does things usually not done in Swift but done in C and can't leverage its knowledge on Cryptography to help itself out.
I had similar problems in nodeJS etc. too, the instance you do anything non-standart AI falls apart and you can see how stupid actually is.
Good way to know if a PR opened by someone on your team was LLM generated is to look for inane explanatory comments like
// Import the necessary dependencies
import { Foo } from './Foo';
// Instantiate a new Foo
const myFoo = new Foo();
// return the expected value
return true;
I don't know if this is because a lot of the training materials are blog posts and documentation, which are focused more on explaining how trivial cases work than "real world" examples, and thus the LLM has "learned" that this is what real code looks like.In my experience, from a few juniors I've cooperated with, they get absolutely awful code from ChatGPT. Things like manual opening and closing of files with unnecessary exception handling that crudely reimplements stuff that's already in the standard library and should have been a one-liner, and sure, ChatGPT will happily suggest explanatory comments and whatnot, but it's like copying in material from a reference manual, i.e. provocatively useless.
To me it also seems like they don't learn much from it either. They've made much more progress from a bit of mentoring and heavy use of REPL:s and similar shells.
I do many other things as a software engineer and writing code was always small part of it but time consuming.
Second most time consuming thing are meetings and explaining things to non technical people something like: "No Jerry we cannot just transfer 100GB of data to WebApp in each user browser for faster searching while also having it 'real time' updated".
I'm not learning, just forgetting. Entirely different skills - exercise is important.
In fact I built a tool [1] that applies this principle for semi automated coding. It uses LLMs for generating code, but leave the context selection and code editing for human to complete.
I find this a sweet spot between productivity and quality of output.
It doesn't matter how many times you showed that it invented assembly instructions or wxwidgets functions, they insist on cheating. I even told them the analogy of going to the gym: you lift with your own strength, you don't use a crane.
And of course, it is evident when you receive students that don't know what is a function or that cannot complete simple exercises during a written test.
We learned by reading, lots of trial and failing, curiosity, asking around, building a minimal reproducible bug for stackoverflow... they (the ones that rely only on chatgpt and not their brain) cannot even formulate a question by themselves.
Even if they do manage to cheat their way to a degree, they won't be able to pass interviews or fulfill their role at a job.
Wouldn't the compiler complain in those cases anyway? Why not let them burn through their mistakes, that's how people tend to learn best :-P
And once you're in that groove, and have built the intuition on what works and what doesn't and where you should challenge or follow up, productivity really goes up quite a bit.
To quote Memento: "Don't believe his lies." (though thankfully, as the LLMs advance, this is becoming less of an issue. Claude 3.5 Sonnet V2 is already a huge step ahead compared to where it once was).
That's not "the good stuff", you've just turned it into robo-Clever Hans.
Then the LLM would take this and redo its answer. Unlike internet strangers, who usually respond unhelpfully to adversarial exchanges because they tend to be too fixated, for my purposes, on finding ways to make their original answer right.
This. Junior devs are f*cked. I don't know how else to say it.
What I realized that the above quote and what follows in the article was true before AI as well. Juniors always missed things that made code production ready.
I don’t think ai in the hands of juniors will create worse code. I think it will spawn a lot more code that was as bad as it was before.
This is still fresh thinking so I don”t have a satisfying conclusion. :)
And that makes it even harder for seniors to teach: it was always hard to figure out where someone has misconceptions, but now you need to work through more code. You don't even know if the misconceptions are just the AI misbehaving, the junior doing junior things, or if the junior should read up on certain design principles that may not be known to the junior yet. So, you end up with another blackbox component that you need to debug :)
I guess the question is whether this role is still employable.
Before merge and deploy you need to run the code. If it fails what do you do? search a new snippet on stack overflow? or paste the error into the ai?
This seems like a completely inefficient way of learning to me.
Our company has a culture of expecting the person who wrote the code to support it, and so if it’s poorly written, they inevitably have to learn to fix it, and build it back in a way that can prevent issues in the future.
Obviously care has to be taken to assign the right projects with the right level of guidance and guard rails but when done well, people learn quickly.
I think the same spirit can be applied to AI generated code
Of course, the question remains of whether companies that buy into AI will hire a sufficient stream of junior developers so that some of them will graduate into seniors.
I had a realisation today: AI gives you more shots at goal. When I was learning (the game) go, being able to play a lot of games against the computer in quick succession built intuition faster. When coding I can try more ideas, faster. Kids these days can make apps faster than you can have lunch. They’ll do it differently but they’ll learn faster than we will.
My son is currently studying engineering, and whenever he is stuck at anything (construction, math, programming, mechanics) he fires up ChatGPT and asks for help. In maybe 90% AI gives him a hint so he can continue, which an extremely short feedback cycle. In the remaining 10% he will have to ask his human tutor at university, who is usually available a few days later. And it is not blindly following the AI's advice, but rather picking ideas from it. It is actually pretty awesome to see what opportunities there are, if AI is not simply used for cheating.
One impressive example he showed me was feeding a screenshot of a finished (simple) construction drawing to the AI and asking for potential errors. The AI replied with a few useless, but also with one extremely helpful suggestion, which helped him eradicate the last mistake in his drawing. I am still not sure if those were generic suggestions or if the AI was able to interpret the drawing to a certain degree.
This experiment made me think, maybe most of the benefit from AI comes from this mental workload shift that our minds subconsciously crave. It's not that we achieve astronomical levels of productivity but rather our minds are free from large programming tasks (which may have downstream effects of course).
You don't need a baker to make bread, but you need a "food technician" to monitor the bread making machines.
So I wouldn't blame the machines for bad bread.
But I don't really have the insights, so I'd love an introduction.
It's just that German consumers demand a certain level of quality in their sourdough, and the market is big enough for people to build machines to deliver that quality at a good price.
Yes, bread here in Singapore is a bit sad. (But we got lots of other great food options to make up for that.)
The technicians are monitoring the results. I don't think the analogy is any good.
Whereas GPTs are built to be as unrepeatable as possible with non-deterministic controls.
So you end up with factory white bread - amazingly fluffy, stores for unusually long without going stale and very shelf stable without refrigeration, has little to no nutrition, but tastes amazing.
it's because the type of product that is suited for industrialized, low skill but high automation production is a very different product from artisanal production (which is what sourdough is - you can't easily use automation for sourdough). I reckon ai coded products will have similarities.
They'd happily mass produce whatever type of bread people wanted, with whatever method people wanted, if they'd pay for it.
Instead people by the cheap sandwich bread.
No, it's full of carbohydrates, it might be lacking B vitamins and what not but energy/nutrition it does have. In some cases it would be sweetened with sugar (so both glucose/fructose)
What you explain is mostly the demand, though, in north europe the black bread (incl. rye) is common for instance.
What’s amazing to one person might not be to another. Is it rich and nuanced like a well-made, hand-crafted German sourdough bread (hard to get these days, nearly impossible in South Africa), or just overly sweet and processed?
But will we need the entire army of "coders" that we currently have?
Germany factory produced sourdough bread is good. (It's better than lots of artisanal stuff in other countries.)
The bread I make at home is order of magnitudes better than any of the bland, dead, oversalted bread I get at local supermarkets in the UK. And I'm nowhere near a baking enthusiast, I just make it at home so I can eat tolerably good bread.
But try, say, France, or Italy, or Spain, or Greece. Just go to a bakery -if you can figure out which ones make the dough in house (in France there are rules for this). And then we can talk about mass-produced German sourdough.
Although I bet the Germans make great pumpernikel.
(They had some great Ficelle at a small bakery in one of those markets under the rail arches near the Bermondsey beer mile. I think it was 'Little Bread Pedlar', but don't quote me on this. Their other baked goods were also tasty. But this was in 2017.)
> The bread I make at home is order of magnitudes better than any of the bland, dead, oversalted bread I get at local supermarkets in the UK.
Interesting that you complain about oversalting. We put quite a lot of salt into our wheat/rye-mixed sourdough here in Singapore; partially for taste but also partially to retard the rapid fermentation you get in the local climate.
> But try, say, France, or Italy, or Spain, or Greece. Just go to a bakery -if you can figure out which ones make the dough in house (in France there are rules for this). And then we can talk about mass-produced German sourdough.
You can also get artisanal bread in Germany, and you can get arbitrarily fancy there. If you are in Berlin, try Domberger Brotwerk. (Their yeasted cakes and (open) sandwiches are also great.)
You can get decent-ish bread in the countries you mentioned, though I think it's all rather white and wheat-y? I prefer at least some rye mixed in. (So I might prefer a German factory produced Mischbrot over an artisanal white wheat; even though the latter might be a better example of its style.)
My point is not that German factory produced bread is the best bread ever. It is not. My point is that it's decent. Decent enough to deny the statement 'and yet the food industry hasn't mastered the sourdough.'
Well, OK, you can find good bread if you get lucky and look for it really hard, but the thing is that the British don't really understand what good bread means. I'm sorry to be racist. I find the same thing about coffee and about most food. The British... they try, right? London is full of posh restaurants. But I really don't think they get it.
>> You can get decent-ish bread in the countries you mentioned, though I think it's all rather white and wheat-y?
You get a whole lot more than "decent-ish" bread in the countries I mentioned! And you don't need to go looking for "artisanal" bread. To my understanding that's a term that's applied to bread made in the UK or US because ordinary bread sucks. But the same is not needed in, e.g., France where there are rules for "pain tradition" ("bread made to tradition"; nothing to do with BDSM :| ) that basically enforce that the bread is made by the baker on the day it is sold. This is a French language site that explains the rules:
https://www.laculturegenerale.com/difference-pain-baguette-t...
To summarise, the dough can't be refrigerated, the bread must be baked on premise and then there's some restrictions on the ingredients (e.g. no additives except fungal amylase).
Btw having rules like that is a very French thing. The French (well, some of them) are very picky about their food and so they have all sorts of standards like AOP (which was a French thing before it was an EU thing) for cheese, wine, pork products and everything else that you can eat really. And that's a good thing and it works: you really should try the bread in France. I get the feeling you haven't - no offence.
Other places like Italy and Greece may not have the same stringent rules so you find more variation (as in all things- e.g. coffee: good in Italy and Greece, passalbe in France, I wouldn't drink it in Belgium or Germany) but for whatever historical and cultural conditions in those countries you're very likely to get very good bread in any random bakery you walk in to.
Like you say white is the mainstay, but in Greece I find that in the last few years that has changed a good deal. Even out in the boondocks where I stay you can find like six or seven varieties of bread per bakery, with white the minority really. My local area has three bakeries, wall-to-wall and the two sell wholemeal, spelt and rye, with and without sourdough. That's partly thanks to the many Albanians who have migrated to Greece in the last few decades and who are master bakers (and stone masons to boot). Also: heavenly pies. Oh man. Now I want one of the "kourou" spinach pikelets with spelt from the Albanian bakery and I'm stuck in the UK :(
Btw, that Albanian bakery also makes bread without salt. In a couple different varieties. I've tried their wholemeal sourdough (I have family with health issues so). Not great but eh, it's without salt. Greece gets very hot in the summer (40+ degrees is unsurprising) but the salt-less bread works just as fine. After all, this is modern times: we can control the temperature and humidity of enclosed spaces, yes? Salt is not needed for preservation anymore, it's now only there for the taste. So I'm very suspicious of industries that claim they can't reduce the salt content of their products "because preservation". As far as I'm concerned, any such claims make me suspicious of a cover-up; specifically that extra salt is used to cover up poor ingredients and poor production.
The term you are looking for might be something like 'culturalist'?
> France where there are rules for "pain tradition" ("bread made to tradition"; nothing to do with BDSM :| ) that basically enforce that the bread is made by the baker on the day it is sold.
Yes, but that's still white wheat bread.
> To summarise, the dough can't be refrigerated, the bread must be baked on premise and then there's some restrictions on the ingredients (e.g. no additives except fungal amylase).
We do some of these things at home, they don't prevent you from making good bread.
> Btw having rules like that is a very French thing. The French (well, some of them) are very picky about their food and so they have all sorts of standards like AOP (which was a French thing before it was an EU thing) for cheese, wine, pork products and everything else that you can eat really. And that's a good thing and it works: you really should try the bread in France. I get the feeling you haven't - no offence.
I've had French bread. It's good for what it is, but it's rather limited. They don't even like rye.
These mandatory rules seem a bit silly to me. (The Germans also really like them.) If you want to make something that conforms to some arbitrary rules, you should be allowed to and be allowed to label it as such, but other people should also be allowed to use whatever ingredients and processes they like.
(I'm still sore about Bavaria forcing their beer purity law on our tasty North Germany beers. But I guess that was the concession we made to get them to join the Prussian-led German Reich.)
> Btw, that Albanian bakery also makes bread without salt.
Yeah, that's a mistake in my opinion.
> Not great but eh, it's without salt.
You seem to think being without salt is a benefit?
(From what I can tell, there are some people with specific health problems for whom salt might be a problem. But normal healthy people do just fine with salt, as long as they drink enough liquids---which the salt makes you want to do naturally anyway. Salt is especially important in your diet if you sweat a lot.)
> After all, this is modern times: we can control the temperature and humidity of enclosed spaces, yes? Salt is not needed for preservation anymore, it's now only there for the taste.
Well, if you want to live in harmony with the local environment, you'll go with salt rather than aircon. So in addition to helping slow down the fermentation, the salt and sourness also help our bread last longer once it's baked here in Singapore.
Salt is tasty. (Up to a limit, of course.)
I like the ideas presented in the post but it’s too long and highly repetitive.
AI will happily expand a few information dense bullet points into a lengthy essay. But the real work of a strong writer is distilling complex ideas into few words.
The hard truth is that you will learn nothing if you avoid doing the work yourself.
I'm often re-reading ewd-273 [0] from Dijkstra, The programming task considered as an intellectual challenge. How little distance have we made since that paper was published! His burning question:
> Can we get a better understanding of the nature of the programming task, so that by virtue of this better understanding, programming becomes an order of magnitude easier, so that our ability to compose reliable programs is increased by a similar order of magnitude?
I think the answer AI-assistants provide is... no. Instead we're using the "same old methods," Dijkstra disliked so much. We're expected to rely on the Lindy effect and debug the code until we feel more confident that it does what we want it to. And we still struggle to convince ourselves that these programs are correct. We have to content ourselves with testing and hoping that we don't cause too much damage in the AI-assisted programming world.
Not my preferred way to work and practice programming.
As for, "democratizing access to programming..." I can't think of a field that is more open to sharing it's knowledge and wisdom. I can't think of a field that is more eager to teach its skills to as many people as possible. I can't think of any industry that is more open to accepting people, without accreditation, to take up the work and become critical contributors.
There's no royal road. You have to do the work if you want to build the skill.
I'm not an educator but I suspect that AI isn't helping people learn the practice of programming. Certainly not in the sense that Dijkstra meant it. It may be helping people who aren't interested in learning the skills to develop software on their own... up to a point, 70% perhaps. But that's always been the case with low-code/no-code systems.
[0] https://www.cs.utexas.edu/~EWD/ewd02xx/EWD273.PDF
Update: Added missing link, fixed consistent mis-spelling of one of my favourite researchers' name!
I understand where you're coming from, but this would imply that managers, product people and even tech leads don't learn anything from working on a project, which would strongly go against my experience. It is absolutely possible to delegate implementation details but stay close to the problem.
This is nothing new. Algorithmic code generation has been around since forever, and it's robust in a way that "AI" is not. This is what many Java developers do, they have tools that integrate deeply with XML and libraries that consume XML output and create systems from that.
Sure, such tooling is dry and boring rather than absurdly polite and submissive, but if that's your kink, are you sure you want to bring it to work? What does it say about you as a professional?
As for IDE-integrated "assistants" and free floating LLM:s, when I don't get wrong code they consistently give suggestions that are much, much more complicated than the code I intend to write. If I were to let those I've tried write my code I'd be a huge liability for my team.
I expect the main result of the "AI" boom in software development to be a lot of work for people that are actually fluent, competent developers maintaining, replacing and decommissioning the stuff synthesised by people who aren't.
If you're building something possibly serious, you expect to be hacking away for years anyway. Whether it takes you one or five weekends to make something for recruiting or financing or whatever doesn't actually matter much, it's not a very good sales pitch.
I think a lot of the hype is from managerial people, the kind that have been swayed by promises from people selling RAD and "low code" and so on over the decades. And yeah, you can put your non-technical employees to work building things if you want, but when it breaks your profits are at risk anyway and the consultants you need to hire won't be cheap.
What I'd like to do is to ask "write me libuv based event loop processing messages described by protobuf files in ./protos directory. Use 4 bytes length prefix as a frame header" and then it goes and updates files in IDE itself, adding them to CMakeLists.txt if needed.
That would be an AI assisted coding and we can then discuss its quality, but does it exist? I'd be happy to give it a go.
Do let me know what you think.
Maybe MCP is a step in that direction?
I've been working on an agentic full-app codegen AI startup for about a year, and used Copilot and other coding assistance tools since it was generally available.
Last year, nobody even thought full app coding tools to be possible. Today they're all the rage: I track ~15 full codegen AI startups (what now seems to be called "agentic coding") and ~10 coding assistants. Of these, around half focus on a specific niche (eg. resolving github issues, full-stack coding a few app types, or building the frontend prototype), and half attempt to do full projects.
The paradox that Addy hints at is that senior, knowledgeable developers are much more likely to get value out of both of these categories. For assistants, you need to inspect the output and fix/adapt it. For agentic coders, you need to be able to micromanage or bypass them on issues that block them.
However, more experienced developers are (rightly) wary of new hyped up tools promising the moon. It's the junior devs, and even non-developers who drink the kool aid and embrace this, and then get stuck on 70%, or 90%... and they don't have the knowledge or experience to go past. It's worse than useless, they've spent their time, money, and possibly reputation (within their teams/orgs) on it, and got nothing out of it.
At the startup I mentioned, virtually all our dev time was spent on trying to move that breaking point from 50%, to 70%, to 90%, to larger projects, ... but in most cases it was still there. Literally an exponential amount of effort to move the needle. Based on this, I don't think we'll be able to see fully autonomous coding agents capable of doing non-toy projects any time soon. At the same time, the capabilities are rising and costs dropping down.
IMHO the biggest current limit for agentic coding is the speed (or lack of) of state-of-the-art models. If you can get 10x speed, you can throw in 10x more reasoning (inference-time computing, to use the modern buzzwords) and get 1.5x-2x better, in terms of quality or capability to reason about more complex projects.
Why not let AI handle these tasks as well?
Just like Eliza is better at producing a reasonable looking conversation, if you pair it with a cooperating human.
I published it to a git repo with unit tests, great coverage, security scanning, and pretty decent documentation of how the tool works.
I estimate just coding the main tool would have been 2 or 3 days and all the other overhead would have been at least another day or two. So I did a week of work in a few hours today. Maybe it did 70%, maybe it did 42.5%, either way it was a massive improvement to the way I used to work.
I am paraphrasing of course. It was a little comical that all the sudden it switched from typing the artifacts to trying to tell me it committed the changes directly.
In some ways, I'm not impressed by AI because much of what AI has achieved I feel could have been done without AI, it's just that putting all of it in a simple textbox is more "sleek" than putting all that functionality in a complex GUI.
I really dislike the entire narrative that's been built around the LLMs. Feels like startups are just creating hype to milk as much money out of VCs for as long as they can. They also like to use the classic and proven blockchain hype vocabulary (we're still early etc.).
Also the constant antropomorphizing of AI is getting ridiculous. We're not even close to replacing juniors with shitty generated code that might work. Reminds me of how we got "sold" automated shopping terminals. More convenient and faster that standing in line with a person but now you've got to do all the work yourself. Also the promises of doing stuff faster is nothing new. Productivity is skyrocketing but burnout is the hot topic at your average software conference.
When the AI boom started in 2022 I've been already focused on how to create provably, or likely correct software on budget.
Since then, I've figured out how to create correct software fast, on rapid iteration. (https://www.osequi.com/)
Now I can combine productivity and quality into one single framework / method / toolchain ... at least for a niche (React apps)
Do I use AI? Only for pair programming: suggestions for algorithms, suggestions for very small technical details like Typescript polymorphism.
Do I need more AI? Not really ...
My framework automates most part of the software development process: design (specification and documentation), development, verification. What's left is understanding aka designing the software architecture, and for that I'm using math, not AI, which provides me provably-correct translatable-to-code-models in a deterministic way. None of these will be offered by AI in the foreseeable future
But, this was a generic LLM, not a coding assistant. I wonder if they are different and if they remember what you were unhappy with the last time.
Also LLMs seem to be good with languages like Python, and really bad with C and Rust, especially when asked to do something with pointers, ownership, optimization etc.
I see a lot of devs who appear to be in a complete state of denial about what is happening. Understandable, but worrying.
Anyway, it was pretty impressive. I decided, having never used Swift and having never built a 2D iOS game, to give it a go myself. In just a couple of evenings, I have a playable prototype that I confidently say would've taken me weeks to get to on my own.
And I'm learning Swift. Reading open-source projects or Swift tutorials is one thing , but seeing code written to satisfy a prompt request - that provides an entirely different level of early comprehension.
For me, it feels like being a magic skilled carpenter with the ability to build a giant building, but with no idea what architects do to make blue prints.
Just end up building a useless mess of wood and nails that eventually gets burned down.
I am actually not impressed by the new o1 at all. I think we might be in denial of how progress is slowing down.
Technically, you never will. Anyone who uses these tools in that way becomes a developer.
Confidently stated without any evidence.
All AI proponents use this phrase repeatedly. Yet I still can’t get these tools to stop making up APIs that don’t exist, and they hey constantly produce face-palm security issues like hard-coded credentials and SQL injection.
Just like I'm a better programmer in Rust than in C, because I offloaded lots of mundane checking to the compiler.
One worry I have is what will happen to my own skills over time with these tools integrated into my workflow. I do think there's a lot of value in going through the loop of struggling with -> developing a better understanding of technologies. While it's possible to maintain this loop with coding assistants, they're undoubtedly optimized towards providing quick answers/results.
I'm able to accomplish a lot more with these coding assistants now, but it makes me wonder what growth I'm missing out on by not always having to do it the "hard" way.
But yes, you often get stuck, with the genAI looping in its proposals.
These people were never at 70% in the first place.
The article also misses experts using this to accelerat themselves at things they are not expert in.
> the future isn't about AI replacing developers - it's about AI becoming an increasingly capable collaborator that can take initiative while still respecting human guidance and expertise.
I believe we will see humans transition to a purely ceremonial role for regulatory/liability reasons. Airplanes fly themselves with autopilot, but we still insist on putting humans at the yoke because everyone feels more comfortable with the arrangement.
I don’t ever use the code completion functionality in fact it can be a bit annoying. However asking it questions is the new Google search.
Over the last couple of years I’ve noticed that the quality of answers you get from googling has steeply declined, with most results now being terrible ad filled blog spam.
Asking the AI assistant the same query yields so much better answers and gives you the opportunity to delve deeper into said answer if you want to.
No more asking on stack overflow and having to wait for the inevitable snarky response.
It’s the best money I’ve spent on software in years. I feel like Picard asking the computer questions
Are you asking for solutions to a specific problem or searching for information on the problem? I still don't have issue with search engines, because I mostly use it for the latter, treating it as an index of the internet (which is what they really are). And for that AI is a huge step down, because I can't rely on the truthfulness of their replies.
Programming is not just about producing a program, it's about developing a mental model of the problem domain and how all the components interact. You don't get that when Claude is writing all your code, so unless the LLM is flawless (which it likely never be on novel problems), you won't understand the problem enough to know how to fix things when they go wrong.
But it goes to show how difficult it is to discuss this stuff, because depending on model, IDE, workflow, experience/seniority, intuition etc. LLM assisted developmental might look very different in each instance.
Assistants that work best in the hands of someone who already knows what they're doing, removing tedium and providing an additional layer of quality assurance.
Pilot's still needed to get the plane in the air.
But even if the output from these tools is perfect, coding isn't only (or even mainly) about writing code, it's about building complex systems and finding workable solutions through problems that sometimes look like cul de sacs.
Once your codebase reaches a few thousand lines, LLMs struggle seeing the big picture and begin introducing one new problem for every one that they solve.
>Error messages that make no sense to normal users
>Edge cases that crash the application
>Confusing UI states that never got cleaned up
>Accessibility completely overlooked
>Performance issues on slower devices
>These aren't just P2 bugs - they're the difference between software people tolerate and software people love.
I wonder if we'll see something like the video game crash of 1983. Market saturation with shoddy games/software, followed by stigmatization: no one is willing to try out new apps anymore, because so many suck.
However one difference between these tools and previous human developed technologies is these tools are offering direct intelligence sent via the cloud to your environment.
That is unprecedented. Its rather like the the first time we started piping energy through wires. Sure it was clunky then, bit give it time. LLMs are just the first phase of this new era.
Make a simple HTML page which
uses the VideoEncoder API to
create a video that the user
can download.
So far, not a single AI has managed to create a working solution.I don't know why. The AIs seem to have an understanding of the VideoEncoder API, so it seems it's not a problem of not having the infos they need. But none comes up with something that works.
But then also the importance of using a calculator as one of the tools to double-check your work (because they make much less mistakes) should not be underestimated... whereas the situation seems to be opposite with LLMs !
And yet, are we worse at math than the previous generations? I'm not so sure. Pretty quickly math becomes toying with letters and not numbers, and unless the calculator has an algebraic engine, it won't help you.
We can focus in more important aspects of math, like actually understanding the concepts. However, to me this will be worse than the calculator: LLM pretend to understand the problem, and can offer solutions. On the other hand, you don't get a calculator with a CAS until late university (if you ever get one). Calculators don't pretend to be more than what they are.
But IMHO we'll still get the benefits of calculators: let the humans focus on the difficult tasks. I don't want to write the full API scaffolding for the tenth time. I don't want to write the boilerplate for that test framework for the 15th time. LLMs are good at those tasks, let them do it!
As AI is able to write more complex code, the skill of the engineer must increase to go in when necessary to diagnose the code it wrote, if you can’t, your app is stuck to the level of the AI
It's easy for a senior engineer to forget how exhausting it is for juniors to read code. I can glance at a page of code from Claude and tell pretty quickly if it's what I want. So it's useful to me if it's right more than half the time. For a junior this is definitely worse than just trying to write it themselves, they would learn more that way and come out less exhausted.
Honestly, this seems like a straw man. The kind of distributed productivity tools like Miro, Figma, Stackblitz, etc. that we all use day-to-day are both impressive in terms of what they do, but even more impressive in terms of how they work. Having been a remote worker 15 years ago, the difference in what is available today is light-years ahead of what was available back then.
Or that he could get successive improvements in the form of real time collaboration with the model.
It is true that a tool that isn’t as reliable as an expert won’t impress an expert. Even if it’s better/faster on 99% of varied tasks for any given human, for fast response output. “It still isn’t great”, by an experts standards.
But as humans, each of our task/field span of expertise or informed amateur fluency is terrifyingly limited compared to the broad awareness of fields and subjects that these models incorporate.
And they are crazy impressive in terms of how much better they have become, qualitatively and quantitatively, inexpensive (to deploy/use) and available, in a few years.
But as for the "why are are doing this in the first place" business documentation is usually outside the source code and therefore out of reach of any genAI, for now.
As for what senior devs should do when coding: > They're constantly:
> Refactoring the generated code into smaller, focused modules
> Adding edge case handling the AI missed
> Strengthening type definitions and interfaces
> Questioning architectural decisions
> Adding comprehensive error handling
Ain't nobody got time for that! The one girl and other guy that could do this, because they know the codebase, have no time to do it. Everyone else works by doing just enough, which is nearly what TDD dictates. And we have PR code review to scrape up quality to barely get maintainable code. And never overcomplicate things, since writing code that works is good enough. And by the time you want to refactor a module three years later, you would want to use another data flow style or library to do the work altogether.
Oddly enough intern employment for the company is quite sophisticated (and exceptionally fair), they are also paid. Yet, there have been cases of having extraordinary interns, some of them got a direct job offers immediately as well.
I would disagree with this. There are many web apps and desktop apps that I’ve been using for years (some open source) and they’ve mostly all gotten noticeably better. I believe this is because the developers can iterate faster with AI.