Most of what people are calling “agents” today are basically deterministic workflows with one or two LLM calls glued together. That is not an agent. That is a at best API pipeline.
So I am genuinely curious: are there any real examples of agents handling large, messy, multi-step workflows at scale? Not demos, not toy projects, not VC decks.
Tool-use is common in most of the major AI models now and it's really the differentiator between how they perform when writing code. Few write correct code the first time. What makes them different is the ability to read and modify complex code across multiple files, without being told which files.
I think by next year, we could see this extend across the UI domain - it writes code, runs it, views the UI, critiques the results, then tweaks things like font and whitespace. I did a prototype mid-year which would even show it to a user, and it would talk them through what they liked or didn't like. But you can even chain it between multiple LLMs (designer, programmer, customer roles) and it would fit your definition.
Usually layers of tools clustered under sub-agents, and fairly detailed orchestration prompts at higher levels. Orchestration via agent-prompts can be better than hard-coded workflows when they require qualitative assessments.
The real value is in horribly manual internal processes where the solutions are agents driving very specific tools that drive weird and wacky systems.
Generic out of the box agents that will solve your particular problem are not a thing yet.
In regard to it’s just an “API pipeline” - the power of agents should be - which set of API calls do I string together to solve a user’s request.