It becomes obsolete in literally weeks, and it also doesn't work 80% of the time. Like why write a mcp server for custom tasks when I don't know if the llm is going to reliably call it.
My rule for AI has been steadfast for months (years?) now. I write (myself, not AI because then I spend more time guiding the AI instead of thinking about the problem) documentation for myself (templates, checklist, etc.). I give ai a chance to one-shot it in seconds, if it can't, I am either review my documentation or I just do it manually.
Here is a simple example which took 4 iterations using Gemini to get a result requiring no manual changes:
# Role
You are an expert Unix shell programmer who comments their code and organizes their code using shell programming best practices.
Create a bash shell script which reads from standard input text in Markdown format and prints all embedded hyperlink URL's.
The script requirements are:
- MUST exclude all inline code elements
- MUST exclude all fenced code blocks
- MUST print all hyperlink URL's
- MUST NOT print hyperlink label
- MUST NOT use Perl compatible regular expressions
- MUST NOT use double quotes within comments
- MUST NOT use single quotes within comments
EDIT:For reference, a hand-written script satisfying the above (excluding comments for brevity) could look like:
#!/usr/bin/env bash
perl -ne 'print unless /^```/ ... /^```/' |
sed -e 's/`[^`]*`//g' |
egrep -o '\[.+?\]\(.+?\)' |
sed -e 's/^.*(//' -e 's/)$//'
0 - https://en.wikipedia.org/wiki/Constraint_programming(While we're at it, there's no need for an apostrophe when pluralising an initialism, like "URLs".)
EG: Step 1: Define problem in PROBLEM.md Step 2: Ask agent to gather scope from codebase and update PROBLEM.md Step 3: Ask agent to create a plan following design and architecture best practices (solid, etc) and update PROBLEM.md Step 4: Ask agent to implement PROBLEM.md
You can tell Gemini or Claude it has to use your MCP tool of choice; you can also tell it not to use other tools if it's confused.
Lots of MCPs come with dozens of functions that can be called; you can disable the ones you don't need. Saves on tokens and the agent is less likely to get confused due to a crowded context window.
You can also write a script for the agent that calls the MCP if you needed to. If the functionality you need is available via the cli, you can use that instead of writing an MCP.
I don't do MCPs much because of effort and security risks. But I find the loop above really effective. The alternative (one-shot or ignore) would be like hiring someone, then if they get it wrong, telling them "I'll do it myself" (or firing them)... But to each his own (and yes, AI are not human).
Time hand-holding an AI agent is wasted when all you guidance inevitably falls out of the context window and it start making the same mistakes again.
Antropomorphizing LLMs like that is the path to madness. That's where all the frustration comes from.
If you look, all the good advice and guidelines for LLMs are effectively the same as for human employees - clarity of communication, sufficient context, not distracting with bullshit, information hygiene, managing trust. There are deep reasons for that, and as a rule of thumb, treating LLMs like naive savants gives reliable intuitions for what works, and what doesn't.
If you treat it as such it is all understandable where they might fail and where you might have to guide them.
Also treat it as something that during training has been biased to produce immediate impressive results. This is why it bundles everything into single files, try catch patterns where catch will return mock data to show impressive one shot demo.
So the above you have to actively fight against, to make them prioritise scalability of the codebase and solutions.
The right mental model for working with LLMs is much closer to "person" than to "machine".
1. starting fresh, because of context poisoning / long-term attention issues
2. lots of tools makes the job easier, if you give them a tool discovery tool, (based on Anthropics recent post)
We don't have reliable ways to evaluate all the prompts and related tweaking. I'm working towards this with my agentic setup. Added time travel for sessions based on Dagger yesterday, with forking, cloning, registry probably toda
High upside (if the AI manages to complete the task it is a time save), relatively low downside (not getting stuck in these AI feedback loops that are ultimately a time waste).
Seems odd to me that people spend so much time promoting some of the least productive aspects of AI tooling.
> I haven't tried complex coding tasks using Gemini 3.0 Pro Preview yet. I reckon it won't be materially different.
Gemini CLI is open source and being actively developed, which is cool (/extensions, /model switching, etc.). I think it has the potential to become a lot better and even close to top players.
The correct way of using Gemini CLI is: ABUSE IT! With 1M Context Window (soon to be 2M) and generous daily (free) quota are huge advantages. It's a pity that people don't use it enough (ABUSE it!). I use it as a TUI / CLI tool to orchestrate tasks and workflows.
> Fun fact: I found Gemini CLI pretty good at judging/critiquing code generated by other tools LoL
Recently I even hook it up with homebrew via MCP (other Linux package managers as well?), and a local LLM powered Knowledge/Context Manager (Nowledge Mem), you can get really creative abusing Gemini CLI, unleash the Gemini power.
I've also seen people use Gemini CLI in SubAgents for MCP Processing (it did work and avoided polluting the main context), can't help laughing when I first read this -> https://x.com/goon_nguyen/status/1987720058504982561
Pro 3 is -very- smart but it's tool use/following directions isn't great.
In my limited testing, I found that Gemini 3 Pro struggles with even simple coding tasks. Sure, I haven't tested complex scenarios yet and have only done so via Antigravity. But it is very difficult to do that with the limited quota it provides. Impressions here - https://dev.amitgawande.com/2025/antigravity-problem
Personally, I consider Antigravity was a positive & ambitious launch. Initial impression was that there are many rough edges to be smoothed out. I hit many errors like 1. communicating with Gemini (Model-as-a-Service) 2. Agent execution terminated due to errors, etc., but somehow it completed the task (verification/review UX is bad).
Pricing for paid plans with AI Pro or Workspace would be key for its adoption, when Gemini 3.x and Antigravity IDE are ready for serious work.
With a good prompt and soem trial and error in system instructions, as long as you agree to play the agent yourself, it's unmatched.
CLI? Never had any success. Claude Code leaves it in dust.
- "the World's first ever, fully featured DVD playing sidebar"
- "allowing users to send e-mails without needing to log-in to an account"
- "an AI (Artificial Intelligence) animated speech character named Phoebe"
Maybe this guy really did make a super original web browser with every bell and whistle as an independent sixteen year old. I never saw a release, so the mystery remains.
https://www.irishscientist.ie/2003/contents_contentxml-03p14...
Currently Claude Code is the best, but I don't think Anthropic would pivot it into what I described. Maybe we still need to wait for the next groundbreaking open-source coding agent to come out.
https://github.com/sst/opencode
There are many many similar alternatives, so here's a random sampling: Crush, Aider, Amp Code, Emacs with gptel/acp-shell, Editor Code Assistant (which aims for an editor-agnostic backend that plugs into different editors)
Finally... there is quite a lot of scope for co-designing the affordances / primitives supported by the coding agent and the LLM backing it (especially in LLM post-training). So factorizing these two into completely independent pieces currently seems unlikely to give the most powerful capabilities.
They don't have to pivot, since it already exists: Claude Code Router [1].
Alas, you don't install Claude Code or Gemini CLI for the actual CLI tool. You install it because the only way agentic coding makes sense is through subscription billing at the vendor - SOTA models burns through tokens too fast for pay-per-use API billing to make sense here; we're talking literally a day of basic use costing more than a monthly subscription to the Max plan at $200 or so.
Cursor?
It’s really quite good.
Ironically it has its own LLM now, https://cursor.com/blog/composer, so it’s sort of going the other way.
I love to switch models and ask them what they thought of the previous models answer
Roo Code or maybe Kilo (which is a fork of Roo)
Sure, software in general will keep evolving rapidly but the methods and tools to build software need to be relatively more stable. E.g. many languages and frameworks come and go, but how we break down a problem, how we discover and understand codebases, etc. have more or less remained steady (I think).
I see this a paradox and have no idea what the state of equalibrium will look like.
you don't need claude code, gemini-cli or codex I've been doing it raw as a (recent) lazyvim user with a proprietary agent with 3 tools: git, ask and ripgrep and currently gemini 3 is by far the best for me even without all these tricks.
gemini 3 has a very high token density and a significantly larger context than any model that is actually usable, every 'agent' I start shoves 5 things into the context:
- most basic instructions such as: generate git format diff only when editing files and use the git tool to merge it (simplified, it's more structured and deeper than this)
- tree command that respects git ignore
- $(ask "summarize $(git diff)")
- $(ask "compact the readme $(cat README.MD"))
- (ripgrep tools, mcp details, etc)
when the context is too bloated I just tell it to write important new details to README.MD and then start a new agent
I don't even generate diffs, just full files (though I try and keep them small) and my success rate is probably close to 80% one-shotting very complex coding tasks that would take me days.
earlier models couldn't generate diffs and I had to generate them which was jank since sometimes it would generate unmergeable code
Tip 1, it consistently ignores my GEMINI.md file, both global and local. Even though it's always saying that "1 GEMINI.md file is being used", probably because the file exists in the right path.
Tip 12, had no idea you could do this, seems like a great tip to me.
Tip 16 was great, thanks. I've been restarting it everytime my environment changes for some reason. Or having it run direnv for me.
All the same warnings about AI apply for Gemini CLI, it hallucinates wildly.
But I have to say gemini cli gave me my first real fun experience using AI. I was a late comer to AI, but what really hooked me was when I gave it permission to freely troubleshoot a k8s PoC cluster I was setting up. Watching it autonomously fetch logs, objects, troubleshoot until it found the error was the closest thing to getting a new toy for christmas for me in many years.
So I've kept using it, but it is frustrating sometimes when AI is behaving so stupid you just /quit and do it yourself.
I think many devs are just in tune with the "nature" of Claude, and run aground easier when trying to use gemini or Chatgpt. This also explains why we get these perplexing mixed signals from different devs.
There certainly is some user preference, but the deal breakers are flat out shortcomings that other tools solved (in AI terms) long ago. I haven’t dealt with agent loops since March with any other tool.
Codex prompt editing sucks
BTW Gemini 3 via Copilot doesn't currently work in Opencode: https://github.com/sst/opencode/issues/4468
> A modern terminal emulator like:
> WezTerm, cross-platform
> Alacritty, cross-platform
> Ghostty, Linux and macOS
> Kitty, Linux and macOS
What's wrong with any terminal? Are those performance gains that important when handling a TUI? :-(
Edit:
Also, I don't see Gemini listed here:
https://opencode.ai/docs/providers/
Only Google Vertex AI (?): https://opencode.ai/docs/providers/#google-vertex-ai
Edit 2:
Ah, Gemini is the model and Google Vertex AI is like AWS Bedrock, it's the Google service actually serving Gemini. I wonder if Gemini can be used from OpenCode when made available through a Google Workspace subscription...
Gemini 3 via any provider except Copilot should work in Opencode.
Good. I'd rather use a tool designed with focus on modern standards than something that has to keep supporting ancient ones every time they roll an update.
But I can't help but to get "AI tutorial fatigue" from so many posts telling me about how to use AI. Most are garbage, this one is better than most. Its like how javascript developer endlessly post about the newest ui framework or js build tool. This feels a lot like that.
Boring.
Sucks when the LLM goes on a rant only to stop because of hardcoded safeguards, or what I encounter often enough with Copilot: it generates some code, notices it's part of existing public code and cancels the entire response. But that still counts towards my usage.
With Gemini 3 release I decided to give it another go, and now the error changed to: "You've reached the daily limit with this model", even though I have an API key with billing set up. It wouldn't let me even try Gemini 3 and even after switching to Gemini 2.5 it would still throw this error after a few messages.
Google might have the best LLMs, but its agentic coding experience leaves a lot to be desired.
I have sympathy for any others who did not get so lucky
> Loaded cached credentials. > Hello world! I am ready for your first command. > gemini -p "hello world" 2.35s user 0.81s system 33% cpu 29.454 total
seeing between 10-80 seconds for responses on hello world. 10-20s of which is for loading the god damn credentials. this thing needs a lot of work.
it's really really terrible at agentic stuff
will give the CLI another shot
And the GPT-5 Codex has a very somber tone. Responses are very brief.
Considering that access is limited to the countries on the list [0], I wonder what motivated their choices, especially since many Balkan countries were left out.
[0]: https://developers.google.com/gemini-code-assist/resources/a...
Gemini models are actually pretty capable but Gemini CLI tooling makes them dumb and useless. Google is simply months behind Anthropic and OpenAI in this space!
still i had high hopes for gemini 3.0 but was let down by the benchmarks i can barely use it in cli however in ai studio its been pretty valuable but not without quirks and bugs
lately it seems like all the agentic coders like claude, codex are starting to converge and differentiated only by latency and overall cli UX and usage.
i would like to use gemini cli more even grok if it was possible to use it like codex
There needs to be a lot more focus on the observability and showing users what is happening underneath the hood (especially wrt costs and context management for non-power users).
A useful feature Cursor has that Antigravity doesn't is the context wheel that increases as you reach the context window limit (but don't get me started on the blackbox that is Cursor pricing).
Best practices for gambling
I’m noticing more workflows stressing the need for lightweight governance signals between agents.
Integration with Google Docs/Spreadsheets/Drive seems interesting but it seems to be via MCP so nothing exclusive/native to Gemini CLI I presume?
Claude at least is more than good enough to do this for dry technical writing (I've not tried it for anything more creative), and so I usually end up using Claude Code to do this with markdown files.
https://news.ycombinator.com/item?id=46048996
Who the heck trusts this jank to have wanton reign on their system?
Its just really good yet.
I recently tried IntelliJs Junie and i have to say it works rather well.
I mean at the end of the day all of them need a human in the loop and the result is just as good as your prompt, tho with Junie i at least most of the time got something of a result, while with gemini 50% would have been a good rate.
Finally: Still dont see agentic coding for production stages - its just not there yet in terms of quality. For research and fun? Why not.
Of Addy Osmani fame.
I seriously doubt he went to Gemini and told it "Give me a list of 30 identifiable issues when agentic coding, and tips to solve them".
It Fucked up the entire repot. It hard coded tenant ids and used ids, it completely destroyed my UI. Broke my entire grpahql integration. Set me back 2 weeks of work.
I do admit the browse version of Gemini chat does much better job at providing architecture and design guidance time to time.
Always make the agent write a plan first and save it to something like plan.md, and tell it to update the list of finished tasks in status.md as it finishes each task from plan.md and to let you review the change before proceeding to next task.
How did this happen?
Did you let the agent loose without first creating its own git worktree?
They're useful for allowing agents to work in parallel. I imagine some people give them access to git and tools and sandbox the agents, then let a bunch of them work in separate git worktrees pointed at the same branch, then they come back and investigate/compare and contrast what the agents have done, to accelerate their work.
I think there is value in that but it also feels like a lot of very draining work and I imagine long term you're no longer in control of the code base. Which, I mean, great if you're working on a huge code base since you already don't control that...