It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.
Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.
Apart from the apologising. It's silly when the AI apologises with ever more sincere apologies. There should be no apologies from AIs.
"..the user suggests using XYZ to move forward, but that would be rather inefficient, perhaps the user is not totally aware of the characteristics of XYZ. We should suggest moving forward with ABC and explain why it is the better choice..."
It depends, because you now have to pay in order to be able to compete against other programmers who're also using AI tools, it wasn't like that in what I'd call the true "golden age", basically the '90s - early part of the 2000s, when the internet was already a thing and one could put together something very cool with just a "basic" text editor.
We used to serve others, but now people are so excited about serving themselves first that there's almost no talk of service to others at all anymore
we literally creating solution for our own problem
The Omega Directive: https://snth.prose.sh/the_omega_directive
I tend to form the story arc in my head, and outline the major events in a timeline, and create very short summaries of important scenes, then use AI to turn those summaries into rough narrative outlines by asking me questions and then using my answers to fill in the details.
Next I'll feed that abbreviated manuscript into AI and brainstorm as to what's missing/where the flow could use improvement/etc with no consideration for prose quality, and start filling in gaps with new scenes until I feel like I have a compelling rough outline.
Then I just plow from beginning to end rewriting each chapter, first with AI to do a "beta" draft, then I rewrite significant chunks by hand to make things really sharp.
After this is done I'll feed the manuscript back into AI and get it to beta read given my target audience profile and ambitions for the book, and ask it to provide me feedback on how I can improve the book. Then I start editing based on this, occasionally adding/deleting scenes or overhauling ones that don't quite work based on a combination of my and AI's estimation. When Gemini starts telling me it can't think of much to improve the manuscript that's when it's time for human beta readers.
I've been wondering about what the legalities of the generated content are though since we know that a lot of the artistic source content was used without consent?C an I put the stories on my blog? Or, not that I wanted to, publish them? I guess people use AI generated code everywhere so I guess for practical purposes the cat is out the bag and won't be put back in again.
Pay-as-you-go with Gemini does not snort your data for their own purposes (allegedly...).
Not happening. Investors would riot.
Looking forward to stage 2 - start serving the advertisers while placating the users, and finally stage 3 - offering it all up to the investors while playing the advertisers off each other and continuing to placate the users.
Each new release is “game changing”.
The implication being the last release y’all said was “game changing” is now “from a different century”.
Do you see it?
For this to be an accurate and true assessment means you were wrong both before and wrong now.
Are you suggesting that a rush to hyperbole which you don't like means advances in a technology aren't groundbreaking?
Or is it that if there is more than one impressive advance in a technology, any advance before the latest wasn't worthy of admiration at the time?
I couldn’t find a way to use Gemini like a prepaid plan. I ain’t giving my credit card to Google for an LLM that can easily charge me hundreds or thousands of EUR.
I want something running in a VM I can safely let all tools execute without human confirmation and I want to write my own tools and plug them in.
Right now a pro max subscription with Claude code plus my MCP servers seems to be the sweet spot, and a cursory look at the Google ecosystem didn’t identify anything like it. Am I overlooking something?
It's my daily driver so far. I switch between the Claude and Gemini models depending on the type of work I'm doing. When I know exactly what I want, I use Claude. When I'm experimenting and discovering, I use Gemini.
Not really my idea of good.
They all seem to work remarkably well writing typescript or python but in my experience, they fall short when it comes to shell and more broadly dev ops
Would you expect that to be Google employing cost-saving measures?
https://developers.google.com/gemini-code-assist/docs/overvi...
What about Zed or something else?
I have not used any IDEs like Cursor or Zed, so I am not sure what I should be using (on Linux). I typically just get on Claude (claude.ai) or ChatGPT and do everything manually. It has worked fine for me so far, but if there is a way to reduce friction, I am willing to give it a try. I do not really need anything advanced, however. I just want to feed it the whole codebase (at times), some documentation, and then provide prompts. I mostly care about support for Claude and perhaps Gemini (would like to try it out).
Is there any concrete example that makes it really obvious? I had no such success with it so far and i would really like to see the clear cut between the gemini and the others.
It's pretty useful as long as you hold it back from writing code too early, or too generally, or sometimes at all. It's a chronic over-writer of code, too. Ignoring most of what it attempts to write and using it to explore the design space without ever getting bogged down in code and other implementation details is great though.
I've been doing something that's new to me but is going to be all over the training data (subscription service using stripe) and have often been able to pivot the planned design of different aspects before writing a single line of code because I can get all the data it already has regurgitated in the context of my particular tech stack and use case.
Essentially we were hoping to tie that to data inputs and have a system to regularly output the visualisation but with dynamic values. I bet my colleague it would one shot it: it did.
What I’ve also found is that even a sloppy prompt still somehow is reading my mind on what to do, even though I’ve expressed myself poorly.
Inversely, I’ve really found myself rejecting suggestions from ChatGPT, even o4-mini-high. It’s just doing so much random crap I didn’t ask and the code is… let’s say not as “Gemini” as I’d prefer.
Every time in the last three or four weeks, there is a post here about Gemini, the top comment, or one of the top comments is something along these lines. And every time I spend a few minutes making empirical tests to check if I made a mistake in cancelling my paid Gemini account after giving up on it...
So I just did a couple of tests sending the same prompt on some AWS related questions to Gemini Pro 2.5 (free) and Claude paid, and no, Claude still better.
That doesn't mean it's worse than the others just not much better. I haven't found anything that worked better than o1-preview so far. How are you using it?
// Moved to foo.ts
Ok, great. That’s what git is for.
// Loop over the users array
Ya. I can read code at a CS101 level, thanks.
Anyone else concerned about this kind of statements? Make no mistake, everyone. We are living in a LLM bubble (not an AI bubble as none of these companies are actually interested in AI as such as moving towards AGI). They are all trying to commercialise LLMs with some minor tweaks. I don't expect LLMs to make the kind of progress made by the first 3 iterations of GPT. And when the insanely hyped overvaluations crashed, the bubble WILL crash. You BETTER hope there is any money left to run this kind of tools at a profit or you will be back at Stackoverflow trying to relearn all the skills you lost using generative coding tools.
But seriously, yeah, Gemini is pretty great.
But more seriously, they need to uncap temperature and allow more samplers if they want to really flex on their competition.
But I don't see how this is good news at all from a societal POV.
The last 15 or so years has seen an unprecedented rise in salaries for engineers, especially software engineers. This has brought an interest in the profession from people who would normally not have considered SW as a profession. I think this is both good and bad. It has brought new found wealth to more people, but it may have also diluted the quality of the talent pool. That said, I think it was mostly good.
Now with this game-changing efficiency from these AI tools, I'm sure we've seen an end to the glory days in terms of salaries for the SW profession.
With this gone, where else could relatively normal people achieve financial independence? Definitely not in the service industry.
Very sad.
But I don't see how this is good news at all from a societal POV.
Think about all the lamplighters who lost their jobs. Streetlights just turn on now? Lamplighting used to be considered a stable job! And what about the ice cutters…
For real tho, it's not like there's nothing left to do — we still have potholes to fix, t-shirts to fold and illnesses to cure. Just the fact that many people continue to believe that wars are justified by resource scarcity shows we need technological progress.
These days not so much.
Learning comes through struggle and it's too easy to bypass that struggle now. It's so much easier to get the answers from AI.
I often find myself repeating this, although one would think it's well-known or even self-evident.
If there's no active struggle, there's no remaining knowledge, it's just fleeting information.
The amount and complexity of software will expand to its very outer bounds for which specialists will be required.
There are plenty of folks making a living using platforms like Salesforce and “clicks not code,” but it never led to an implosion of the SE job market. Just expanded the tech job pool. And it’s hard to imagine how that would have happened if everything needed to be coded.
Like how a growth in medical-paraprofessionals didn’t negate the need for doctors and nurses.
With all this money sloshing around, it takes only a little imagination to think of ways of channeling some of it to working people without employing them to write pointless (or in some cases actively harmful) software.
I personally don't think we are ever going to get to that point where I can give a simple propnmt and have an LLM generate a complex app ready to run. Think about what that would require:
1. The LLM would have to read my mind and extrapolate all the minute decisions I would make to implement the app based on the prompt.
2. Assuming the LLM can get past (1), it would have to basically be AGI to be able to implement pretty much whatever I can dream up.
3. If 2 & 3 above is somehow achieved, it would be economically very valuable, and you can bet that functionality is not going to be casually enabled in LLMs, for just anyone to use.
(Also, just by market logic, rare skills in demand are always paid more; I'm not sure why you're calling it an "exclusion". The education system in a lot of places might have that function, but that's a separate issue not helped by LLMs writing SQL?)
I also contest your definition of wealth. Society absolutely and obviously becomes wealthier when many more people are able to use computers for more advanced things. Just because that wealth doesn't directly appear as zeros in your financial statements doesn't mean the wealth hasn't been created.
I do have to ask though, who do you think will pay the electricity bill for disenfranchised groups lacking wealth the most strongly to use LLMs? Some things might be free right now, but what do you think will happen when some of e.g. OpenAI's $300bn valuation is being collected?
I feel like that's actually true now with LLMs -- if some query I write doesn't get one-shotted, I don't bother with a galaxy-brain prompt; I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.
Feels like innovation in AI is rapidly changing from paradigm-shifting to incremental.
1 month to write some code with LLM, that's quite the opposite of the promised productivity gain
A writer won't think that they're good at creative writing. In fact, I'm pretty sure they'd think LLM's are terrible at creative writing.
In other words, to an expert in their field, they're not that good - at least not yet.
But to someone who is not an expert, they're unbelievably good - they're enabled to do something they had zero ability to do before.
That's all it takes to get reliably excellent results. It's not perfect, but, at this point, 90% hallucinations on normal SDK usage strongly suggests poor usage of what is the current state of the art.
If you had something on the other side to hallucinate the API itself you could have a program that dreams itself into existence as you use it.
Then it apologizes and gives the right answer. It's weird. We really need a new work for what they're doing, 'cos it ain't thinking.
Do you need an expert to verify if the answer from AI is correct? How is it time saved refining prompts instead of SQL? Is it typing time? How can you know the results are correct if you aren't able to do it yourself? Why should a junior (sorcerer's apprentice) be trusted in charge of using AI? No matter the domain, from art to code to business rules, you still need an expert to verify the results. Would they (and their company) be in a better place to design a solution to a problem themselves, knowing their own assumptions? Or just check of a list of happy-path results without a FULL knowledge of the underlying design? This is not just a change from hand-crafting to line-production, it's a change from deterministic problem-solving to near-enough is good enough, sold as the new truth in problem-solving. It smells wrong.
We recently did the first speed run where Louie.ai beat teams of professional cybersecurity analysts in an open competition, Splunk's annual Boss of the SOC. Think writing queries, wrangling Python, and scanning through 100+ log sources to answer frustratingly sloppy database questions:
- We get 100% correct for basic stuff in the first half that takes most people 5-15 minutes per question, and 50% correct in the second half that most people take 15-45+ minute per question, and most teams time out in that second half.
- ... Louie does a median 2-3min per question irrespective of the expected difficulty, so about 10X faster than a team of 5 (wall clock), and 30X less work (person hours). Louie isn't burnt out at the end ;-)
- This doesn't happen out-of-the-box with frontier models, including fancy reasoning ones. Likewise, letting the typical tool here burn tokens until it finds an answer would cost more than a new hire, which is why we measure as a speedrun vs deceptively uncapped auto-solve count.
- The frontier models DO have good intuition , understand many errors, and for popular languages, DO generate good text2query. We are generally happy with OpenAI for example, so it's more on how Louie and the operator uses it.
- We found we had to add in key context and strategies. You see a bit in Claude Code and Cursor, except those are quite generic, so would have failed as well. Intuitively in coding, you want to use types/lint/tests, and same but diff issues if you do database stuff. But there is a lot more, by domain, in my experience, and expecting tools to just work is unlikely, so having domain relevant patterns baked in and that you can extend is key, and so is learning loops.
A bit more on louie's speed run here: https://www.linkedin.com/posts/leo-meyerovich-09649219_genai...
This is our first attempt at the speed run. I expect Louie to improve: my answers represent the current floor, not the ceiling of where things are (dizzyingly) going. Happy to answer any other q's where data might help!
The speed run formulation for all those same questions helps measure real-world quality vs cost trade-offs. I don't find uncapped solve rates to be relevant to most scenarios. If we allowed infinite time, yes we would have scored even higher... But if our users also ran it that way, it would bankrupt them.
If anyone is in the industry, there are surprisingly few open tests here. That is another part of why we did BOTS. IMO sunlight here brings progress, and I would love to chat with others on doing more open benchmarks!
> Do you need an expert to verify if the answer from AI is correct?
If the underling data has a quality issue that is not obvious to a human, the AI will miss it too. Otherwise, the AI will correct it for you. But I would argue that it's highly probable that your expert would have missed it too... So, no, it's not a silver bullet yet, and the AI model often lacks enough context that humans have, and the capacity to take a step back.
> How is it time saved refining prompts instead of SQL?
I wouldn't call that "prompting". It's just a chat. I'm at least ~10x faster (for reasonable complex & interesting queries).
There isn't one perfect solution to SQL queries against complex systems.
A suduko has one solution.
A reasonably well-optimised SQL solution is what the good use of SQL tries to achieve. And it can be the difference between a total lock-up and a fast running of a script that keeps the rest of a complex system from falling over.
It's not even about whether or not the number of solutions is limited. A math problem can have unlimited amount of proofs (if we allow arbitrarily long proofs), but it's still easier to verify one than to come up with one.
Of course writing SQL isn't necessarily comparable to sudoku. But the difference, in the context of verifiability, is definitely not "SQL has no single solution."
So for example, I was mucking around with ffmpeg and mkv files, and instead of searching for the answer to my thought-bubble (which I doubt would have been "quick" or "productive" on google), I straight up asked it what I wanted to know;
> are there any features for mkv files like what ffmpeg does when making mp4 files with the option `--movflags faststart`?
And it gave me a great answer! (...the answer happened to be based upon our prior conversation of av1 encoding, and so it told me about increasing the I-frame frequency).
Another example from today - I was trying to build mp4v2 but ran in to drama because I don't want to take the easy road and install all the programs needed to "build" (I've taken to doing my hobby-coding as if I'm on a corporate-PC without admin rights (windows)). I also don't know about "cmake" and stuff, but I went and downloaded the portable zip and moved the exe to my `%user-path%/tools/` folder, but it gave an error. I did a quick search, but the google results were grim, so I went to chat-gpt. I said; > I'm trying to build this project off github, but I don't have cmake installed because I can't, so I'm using a portable version. It's giving me this error though: [*error*]
And the aforementioned error was pretty generic, but chat-gpt still gave a fantastic response along the lines of; > Ok, first off, you must not have all the files that cmake.exe needs in the same folder, so to fix do ..[stuff, including explicit powershell commands to set PATH variables, as I had told it I was using powershell before].
> And once cmake is fixed, you still need [this and that].
> For [this], and because you want portable, here's how to setup Ninja [...]
> For [that], and even though you said you dont want to install things, you might consider ..[MSVC instructions].
> If not, you can ..[mingw-w64 instructions].
> I'm wondering if it would be beneficial to add an electric-assist motor to an existing petrol vehicle. There are some 2010 era SUV's that have relatively uneconomical petrol engines, which may be good candidates. That is because some of them are RWD, whilst some are AWD. The AWD gearbox and transfer case could be fitted to the RWD, leaving the transfers front "output" unconnected. Could an electric motor then be connected to this shaft, hence making it an input?
It gave a decent answer, but it was focused on the "front diff" and "front driveshaft" and stuff like that. It hadn't quite grasped what I was implying, although it knew what it was talking about! It brought up various things that I knew were relevant (the "domain knowledge" aspect), so I brought some of those things in my reply (like about the viscous coupling and torque split); > I mentioned the AWD gearbox+transfer into a RWD-only vehicle, thus keeping it RWD only. Thus both petrol+electric would be "driving" at the same time, but I imagine the electric would reduce the effort required from the petrol. The transfer case is a simple "differential" type, without any control or viscous couplings or anything - just simple gear ratio differences that normally torque-split 35% to the front and 65% to the rear. So I imagine the open-differential would handle the 2 different input speeds could "combine" to 1 output?
That was enough to "fix" its answer (see below). And IMO, it was a good answer!I'm posting this because I read a thread on here yesterday/2-days-ago about people stuggling with their AI's context/conversation getting "poisoned" (their word). So whilst I don't use AI that much, I also haven't had issue with it, and maybe that's because of that way I converse with it?
---------
"Edit": Well, the conversation was too long for HN, so I put it here - https://gist.github.com/neRok00/53e97988e1a3e41f3a688a75fe3b...
It's the cleanest way to give the right context and the best place to pull a human in the loop.
A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.
With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.
We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.
My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.
I’m sorry, I can’t. The tail is wagging the dog.
dang, can you delete my account and scrub my history? I’m serious.
{
"dimensions": [
"users.state",
"users.city",
"orders.status"
],
"measures": [
"orders.count"
],
"filters": [
{
"member": "users.state",
"operator": "notEquals",
"values": ["us-wa"]
}
],
"timeDimensions": [
{
"dimension": "orders.created_at",
"dateRange": ["2020-01-01", "2021-01-01"]
}
],
"limit": 10
}
than this: SELECT
users.state,
users.city,
orders.status,
sum(orders.count)
FROM orders
CROSS JOIN users
WHERE
users.state != 'us-wa'
AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
GROUP BY 1, 2, 3
LIMIT 10;
From a schema standpoint, table `orders` presumably has a row per order, with columns like `user_id`, `status` (as you stated), `created_at` (same), etc. Why would there be a `count` column? What does that represent?
From a query standpoint, I'm not sure what this would accomplish. You want the cartesian product of `users` and `orders`, filtered to all states except Washington, and where the order was created in 2020? The only reason I can think of to use a CROSS JOIN would be if there is no logical link between the tables, but that doesn't make any sense for this, because users:orders should be a 1:M relationship. Orders don't place themselves.
I think what you might have meant would be:
SELECT
users.state,
users.city,
orders.status,
COUNT(*)
FROM users
JOIN orders ON user.id = orders.user_id
WHERE
users.state != 'us-wa' AND
orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
GROUP BY 1, 2, 3
LIMIT 10;
Though without an ORDER BY, this has no significant meaning, and is a random sampling at best.Also, if you or anyone else is creating a schema like this, _please_ don't make this denormalized mess. `orders.status` is going to be extremely low cardinality, as is `users.state` (to a lesser extent), and `users.city` (to an even lesser extent, but still). Make separate lookup tables for `city` and/or `state` (you don't even need to worry about pre-populating these, you can use GeoNames[0]). For `status`, you could do the same, or turn them into native ENUM [1] if you'd like to save a lookup.
[1]: https://www.postgresql.org/docs/current/datatype-enum.html
> Please convert the following fragment of a programming language (auto-detect) into a json-like parsing information when language construct is represented like an object, fixed branches are represented like properties and iterative clauses (statement list for example) as array.
JSON:
{"foo": ["bar", 42]}
XON: <Object>
<Property>
<Key>foo</Key>
<Value>
<Array>
<String>bar</String>
<Number>42</Number>
</Arra>
</Value>
</Property>
</Object>
It gives you all the flexibility of JSON with the mature tooling of XML!Edit: jesus christ, it actually exists https://sevenval.gitbook.io/flat/reference/templating/oxn
^ kids, this is what AI-induced brainrot looks like.
In all seriousness, I have some complaints about SQL (I think LINQ’s reordering of it is a good idea), but there’s no need to invent another layer on order for LLMs to be able to wrangle it.
You should have written your comment in JSON instead of raw English.
But I would never use one that forced me to express my queries in JSON. The best implementations integrate right into the database so they become an integral part of regular your SQL queries, and as such also available to all your tools.
In my experience, from using the Exasol Semantic Layer, it can be a totally seamless experience.
At the moment GCP are at 76%, humans are at 93%.
LLMs make some things that were difficult very easy now.
Good article!
[1] https://www.malloydata.dev/ [2] https://docs.malloydata.dev/documentation/user_guides/malloy... [3] https://github.com/malloydata/publisher
I’m curious to know what people are doing to measure whether the customer got what they were looking for. Thumbs up/down seems insufficient to me.
The ability of the LLM to perform purely depends on having good knowledge of what is going to get asked and how, which is more complex than it sounds
What techniques are people having success with?
Works pretty well for me, where you can typically get within the range of human2human variance.
I'm certain they'll get there soon, they're just not there yet.
I don't need AI to generate perfect SQL, because I am never going to trust the output enough to copy/paste it — the risk of subtle semantic errors is too high, even if the code validates.
Instead, I find it helpful for AI to suggest approaches — after which I will manually craft the SQL, starting from scratch.
Anyone that knows a database well can bring it down with a innocent looking statement that no one else will blink at.
I also tend to turn to AI for advising me on difficult use cases, and most of the time it's for production code rather than one-offs. The easy cases, I just write myself because it's more mental effort to review code for subtle errors than it is to write it.
It seems to me that this skeptical mindset is consonant with handling AI output with care.
ai is not going to replace the senior sql expert with 20 years of battle experience in the short-term but support me who last dug into sql 15 years ago and needs to get a working sql query in a project. and ai usually does a better job than me copy pasting googled code in between quickly browsing through tutorials.
Is it to build a copilot for a data analyst or to get business insight without going through an analyst?
If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.
These don’t seem like text2sql problems:
> Why did we hit only 80% of our daily ecommmerce transaction yesterday?
> Why is customer acquisition cost trending up?
> Why was the campaign in NYC worse than the same in SF?
Correct, but I would propose two things to add to your analysis:
1. Natural language text is a universal input to LLM systems
2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions
And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.
There's clearly much work needed to achieve that goal.
But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.
Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.
I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.
Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.
“Thank you for your request. Can you walk me through the steps you’d use to do this manually? What things would you watch out for? What kind of number ranges are reasonable? I can propose an algorithm and you tell me if that’s correct. The admins have set up guidelines on how to reason about customer and purchase data. Is the following consistent with your expectations?”
Quote: "While sampling, after every token, our inference engine will determine which tokens are valid to be produced next based on the previously generated tokens and the rules within the grammar that indicate which tokens are valid next. We then use this list of tokens to mask the next sampling step, which effectively lowers the probability of invalid tokens to 0. Because we have preprocessed the schema, we can use a cached data structure to do this efficiently, with minimal latency overhead."
I.e. mask any tokens that would produce something that isn't valid SQL in the given dialect, or further, a valid query for the given schema. I assume some structured outputs capability is latent to most assistants nowadays, so they probably already have explored this
[1] https://openai.com/index/introducing-structured-outputs-in-t...
My recent endevour was with Gemini 2.5:
- Write me a simple todo app on cloudflare with auth0 authentication.
- Here's a simple todo on cloudflare. We import the @auth0-cloudflare and...
- Does that @auth0-cloudflare exists?
- Oh, it doesn't. I can give you a walkthrough on how to set up an account on auth0. Would you like me to?
- Yes, please.
- Here. I'm going to write the walkthrough in a document... (proceed to create an empty document)
- That seems to be an empty document.
- Oh, my bad. I'll produce it once more. (proceed to create another empty document)
- Seems like you're md parsing library is broken, can you write it in chat instead?
- Yes... (your gemini trial has expired, would you like to pay $100 to continue?)
When someone uses the same tools as I do but seem to experience problems I do not have - these kind of posts often describes how bad LLMs are or how bad Google search is - I get a bit confused. Is it A/B testing going on? Am I just lucky? Am I inattentive to these weaknesses? Is it about promoting? Or what areas we work in? Do we actually use the same tools (i.e., same models)?
Claude and Gemini are pretty decent at providing a small and tight function definition with well defined parameters and output, but anything big and it starts losing shit left and right.
All vibecoding sessions I've seen have been pretty dead easy stuff with lot of boilerplate, maybe I'm weird for just not writing a lot of boilerplate and rely on well-built expressive abstractions..
Maybe in the future all of these assistants will offer something amazing, but in my experience, there is more time invested in prompting that just reading the relevant documentation and having a coherent design.
My suspicion is that many, (but not all please no flames) of the biggest boosters of AI coding are simply inexperienced. If this is true, it makes sense that they wouldn't recognize the numerous foot-guns in AI generated code.
* Variable naming
* Summarizing unfamiliar code
* Producing boilerplate code when I have examples
* Producing one-liners when I've forgotten the parameter order or API specification. I double check, but this is basically a Google that directly answers your question
* Pre-code brainstorming
* Code review. Depending on the language it can catch classes of problems that escape linters
In my experience it won't produce production-ready code, but it's great as a rubber duck and a second pair of eyes.
Meanwhile we get claims that the tools are as capable as a junior programmer, and CEOs believe that.
The whole thing feels like we're in a collective delusion because idiotic managers and C-suites are blindly lapping up the advertising slop coming from the AI companies.
Why are you so defensive about the tech?
Involved in any AI startups, perhaps?
Here are some headlines:
OpenAI explains why ChatGPT became too sycophantic [0]
Anthropic blames Claude AI for ‘embarrassing and unintentional mistake’ in legal filing [1]
xAI blames Grok’s obsession with white genocide on an ‘unauthorized modification’ [2]
So you see, these AI models can easily be prone to producing nonsensical errors unpredictably at any time.
[0] https://techcrunch.com/2025/04/29/openai-explains-why-chatgp...
[1] https://www.theverge.com/news/668315/anthropic-claude-legal-...
[2] https://techcrunch.com/2025/05/15/xai-blames-groks-obsession...
I see the promise for green-field projects.
I wish developers would make use of long table names and column names. For example, pcat_extension could have been named release_schema_1_0.product_category_extension. And cat_id2 could have been named category_id2.
Sometimes when I want to fine tune a query I am challenging AI to provide a better solution. I give it the already optimized query and I ask for better. I never got a better answer, sometimes because AI is hallucinating or because the changes that it proposes are not working in a way that is beneficial, it is like an idiot parrot is telling what it overheard in the brothel - good info if it is a war brothel frequented by enemy officers in 1916, but not these days.
That's what read replicas with read-only access are for. Production db servers should not be open to random queries and usage by people. That's only for the app to use.
This was my experience as well. However I have observed that things have been improving this regard. Newer LLMs do perform much better. And I suspect they will continue to get better over time.
AI is just increasing the frequency of things turning to custard :)
At least for the only OLAP DB I use often - Amazon Redshift - that’s a solved problem with Workload Management Queues. You can restrict those users ability to consume too many resources.
For queries that are used for OLTP, I usually try to keep those queries relatively simple. If there is a reason for read queries that consume resources , those go to read replicas when strong consistently isn’t required
> If the user is a technical analyst or a developer asking a vague question, giving them a reasonable, but perhaps not 100% correct SQL query is a good starting point
> Out of the box, LLMs are particularly good at tasks like creative writing, summarizing or extracting information from documents.
I don't -think- this was written by an LLM, but it really pulls me out of the technical article.
As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like
credit.CLIENT->NAME
And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad. DEFINE products_by_order AS orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id
You could make it visible to the DB rather than just a macro so it could optimise it by caching etc. Sort of like a view but on demand. select Movie {
id,
title,
actors: {
name
}
};
https://docs.geldata.com/learn/edgeql#select-objectsAlthough I think good enough language server / IDE could automatically insert the join when you typed `credit.CLIENT->NAME`
imo any hope of really leveraging llms in this context needs this + human review on additions to a shared ontology/semantic layer so most of the nuanced stuff is expressed simply and reviewed by engineering before business goes wild with it
It's not like it's some obscure thing, it's absolutely ubiquitous.
Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.
It takes far longer to tell an AI what you want than to write a regex yourself.
My experience is the exact opposite. Writing anything but the simplest regex by hand still takes me significant time, and I've been using them for decades.
Getting an LLM to spit out a regex is so much less work. Especially since an LLM already knows the details of the different potential dialects of regex.
I use them to write regexes in PostgreSQL, Python, JavaScript, ripgrep... they've turned writing a regex from something I expect to involve a bunch of documentation diving to something I'll do on a whim.
Here's a recent example - my prompt included a copy of a PostgreSQL schema and these instructions:
Write me a SQL query to extract
all of my images and their alt
tags using regular expressions.
In HTML documents it should look
for either <img .* src="..." .*
alt="..." or <img alt="..." .*
src="..." (images may be self-
closing XHTML style in some
places). In Markdown they will
always be 
I ended up with 100 lines of SQL: https://gist.github.com/simonw/5b44a662354e124e33cc1d4704cdb...The markdown portion of that is a good example of the kind of regex I don't enjoy writing by hand, due to the need to remember exactly which characters to escape and how:
(REGEXP_MATCHES(commentary,
'!\[([^\]]*)\]\(([^)]*)\)', 'g'))[2] AS src,
(REGEXP_MATCHES(commentary,
'!\[([^\]]*)\]\(([^)]*)\)', 'g'))[1] AS alt_text
Full prompt and notes here: https://simonwillison.net/2025/Apr/28/dashboard-alt-text/ (REGEXP_MATCHES(commentary,
'!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[2] AS src,
(REGEXP_MATCHES(commentary,
'!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[1] AS alt_text
That is just nitpicking a one-off example though, I understand your wider point.I appreciate the LLM is useful for problems outside one's usual scope of comfort. I'm mainly saying that I think it's a skill where the "time economics" really are in favor of learning it and expanding your scope. As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly. Certainly not the case for all skills, but I truly believe regex is one of them due to its small scope and ubiquitous application. The LLM can be used for the remaining 10% of really complicated cases.
As you've been using regex for decades, there is already a large subset of problems where you're faster than the LLM. So that problem space exists, it's all about how to tune learning time to right-size it for the frequency the problems are encountered. Regex, I think, is simple enough & frequent enough where that works very well.
It doesn't matter how fast I get at regex, I still won't be able to type any but the shortest (<5 characters) patterns out quicker than an LLM can. They are typing assistants that can make really good guesses about my vaguely worded intent.
As for learning deficit: I am learning so much more thanks to heavy use of LLMs!
Prior to LLMs the idea of using a 100 line PostgreSQL query with embedded regex to answer a mild curiosity about my use of alt text would have finished at the idea stage: that's not a high value enough problem for me to invest more than a couple of minutes, so I would not have done it at all.
Whenever I have worked on code smells (performance issues, fuzzy test fails etc), regex was 3rd only to poorly written SQL queries, and/or network latency.
All-in-all, not a good experience for me. Regex is the one task that I almost entirely rely on GitHub Copilot in the 3-4 times a year I have to.
A shortcut to type in natural language and get something I can validate in seconds is really useful.
LLMs have been amazing in my experience putting together awk scripts or shell scripts in general. I've also discovered many more tools and features I wouldn't have otherwise.
Of course, there may be cases you didn't think of where it behaves incorrectly. But if that's true, you're just as likely to forget those cases when studying the expression to see "what it actually says". If you have tests, fixing a broken case (once you discover it) is easy to do without breaking the existing cases you care about.
So for me, getting an AI to write a regex, and writing some tests for it (possibly with AI help) is a reasonable way to work.
libraries some times make weird choices
in theory theory and practice are the same, in practice not really
in the context of regex, you have to know which dialect and programming language version of regex you’re targeting for example. its not really universal how all libs/languages works
thus the need to test
I needed one to do something with Markdown which was a very internal BigCo thing to need to do, something I'd never have written without weird requirements in play. It wasn't that tricky, but going back trying to get LLMs to replicate it after the fact from the same description I was working from, they were hopeless. I need to dig that out again and try it on the latest models.
I’ve found that the vast majority of programmers today do not have any foundation in formal languages and/or the theory of computation (something that 10 years ago was pretty common to assume).
It used to be pretty safe to assume that everyone from perl hackers to computer science theorists understood regex pretty well, but I’ve found it’s increasingly a rare skill. While it used to be common for all programmers to understand these things, even people with a CS background view that as some annoying course they forgot as soon as the exam was over.
Which is why I would ask an AI to build it if it could.
took me 25.75 seconds, including learning how the website worked. I actually solved it in ~15 seconds, but I hadn't realized I got the correct answer becuase it was far too simple.
This website is much better https://regexcrossword.com/challenges/experienced/puzzles/e9...
[0]: https://blog.cloudflare.com/details-of-the-cloudflare-outage...
With an AI prompt you'll have to do the same thing, just more verbosely.
You will have to do what every programmer hates, write a full formal specification in English.
https://blog.codinghorror.com/regular-expressions-now-you-ha...
Thus, this is mainly just a tool for true experts to do less work and still get paid the same, not a tool for beginners to rise to the level of experts.
Obviously being able to at least read a bit of SQL and understanding the basic idea of relational databases helps loads.
But how do you know if the SQL is correct, or just happened to return results that match for one particular case?
> awesome-Text2SQL: https://github.com/eosphoros-ai/Awesome-Text2SQL
> Awesome-code-llm > Benchmarks > Text to SQL: https://github.com/codefuse-ai/Awesome-Code-LLM#text-to-sql
Sounds like a bunch of bespoke not-AI work is being done to make up for LLM limitations that point blank can’t be resolved.
I'm pretty comfortable with sql but still found it a fabulous tool recently. I have a sql database which describes a tree of some ~600k events. Each event is in a session (via session_id). Most events have a parent event - and trees of events can involve multiple sessions.
I wanted to add two derived columns to my events table. For each event, I wanted to name the root event for that event's tree and the root event within this session. I had code in typescript to do it - but unsurprisingly it was pretty slow. Well, it turns out you can write a recursive SQL query which can traverse the graph and populate those columns. I had no idea that was even possible.
ChatGPT managed it pretty well - though I ended up making a bunch of tweaks to the query it suggested to simplify it. I learned a bunch of SQL in the process - and that was cool! Obviously I could have read the SQL documentation and figured it out myself, but it was faster & easier using chatgpt. Writing SQL queries is a fantastic use case for LLMs.