Can I just say that Google AI Studio with latest Gemini is stunningly, amazingly, game changingly impressive.

It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.

Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.

I'm really surprised more people haven't caught on. Claude can one shot small stuff of similar complexity, but as soon as you start to really push the model into longer, more involved use cases Gemini pulls way ahead. The context handling is so impressive, in addition to using it for coding agents, I use Gemini as a beta reader for a fairly long manuscript (~85k words) and it absolutely nails it, providing a high level report that's comparable to what a solid human beta reader would provide in seconds.
It is absolutely the greatest golden age in programming ever - all these infinitely wealthy companies spending bajillions competing on who can make the best programming companion.

Apart from the apologising. It's silly when the AI apologises with ever more sincere apologies. There should be no apologies from AIs.

You're absolutely right! My mistake. I'll be careful about apologizing too much in the future.
You sound like a Canadian LLM!
Eh?
I wish my AI would tell me when I'm going in the wrong direction, instead of just placating my stupid request over and over until I realize.. even though it probably could have suggested a smarter direction, but instead just told me "Great idea! "
I don't know if you have used 2.5, but it is the first model to disagree with directions I have provided...

"..the user suggests using XYZ to move forward, but that would be rather inefficient, perhaps the user is not totally aware of the characteristics of XYZ. We should suggest moving forward with ABC and explain why it is the better choice..."

  • redog
  • ·
  • 1 hour ago
  • ·
  • [ - ]
It really gave me a lot of push back once when I wanted to use a js library over a python one for a particular project. Like I gave it my demo code in js and it basically said, "meh, cute but use this python one because ...reasons..."
  • ·
  • 2 hours ago
  • ·
  • [ - ]
> It is absolutely the greatest golden age in programming ever

It depends, because you now have to pay in order to be able to compete against other programmers who're also using AI tools, it wasn't like that in what I'd call the true "golden age", basically the '90s - early part of the 2000s, when the internet was already a thing and one could put together something very cool with just a "basic" text editor.

Wow yeah I'm old enough to remember when the focus wasn't on the programmers, but on the people the programs were written for.

We used to serve others, but now people are so excited about serving themselves first that there's almost no talk of service to others at all anymore

companion or replacement?
they would replace entire software department until AI make bug because endless changes into your javascript framework then they would hire human again to make fix

we literally creating solution for our own problem

Or, just let their users deal with the bugs b/c churn will be less than the cost of developers.
  • Terr_
  • ·
  • 16 hours ago
  • ·
  • [ - ]
... Or saboteur. :p
[dead]
I also used it to "vibe write" a short story. I use it similarly to vibe coding, I give the theme and structure of the story along with the major sections and tensions and conflicts I want to express and then it filled in the words in my chosen style. I also created an editor persona and then we went back and forth between the editor and writer personas to refine the story.

The Omega Directive: https://snth.prose.sh/the_omega_directive

My writing process is a bit different from my coding process with AI, it's more of an iterative refinement process.

I tend to form the story arc in my head, and outline the major events in a timeline, and create very short summaries of important scenes, then use AI to turn those summaries into rough narrative outlines by asking me questions and then using my answers to fill in the details.

Next I'll feed that abbreviated manuscript into AI and brainstorm as to what's missing/where the flow could use improvement/etc with no consideration for prose quality, and start filling in gaps with new scenes until I feel like I have a compelling rough outline.

Then I just plow from beginning to end rewriting each chapter, first with AI to do a "beta" draft, then I rewrite significant chunks by hand to make things really sharp.

After this is done I'll feed the manuscript back into AI and get it to beta read given my target audience profile and ambitions for the book, and ask it to provide me feedback on how I can improve the book. Then I start editing based on this, occasionally adding/deleting scenes or overhauling ones that don't quite work based on a combination of my and AI's estimation. When Gemini starts telling me it can't think of much to improve the manuscript that's when it's time for human beta readers.

Thank you for sharing that. I'm going to try that up to "then I rewrite significant chunks by hand to make things really sharp". I'm not a writer a would have never dreamed of writing anything until I gave this a try. I've often had ideas for stories though and using Gemini to bring these to "paper" has felt like a superpower similar how it must feel for people who can't code but now can able to create apps thanks to AI. I think it's a really exciting time!

I've been wondering about what the legalities of the generated content are though since we know that a lot of the artistic source content was used without consent?C an I put the stories on my blog? Or, not that I wanted to, publish them? I guess people use AI generated code everywhere so I guess for practical purposes the cat is out the bag and won't be put back in again.

If you've put manual work into curating and assembling AI output, you have copyright. It's only not copyrightable if you had the AI one shot something.
That sounds very similar to my AI vibe writing process. Start with chapter outlines, then ask AI to fill in the details for each scene. Then ask AI to point out any plot holes or areas for improvement in the chapter (with relation to other chapters). Then go through chapter by chapter for a second rewrite doing the same thing. At ~100k words for a fan-fiction novel but expect to be at about 120k words after this latest rewrite.

https://frypatch.github.io/The-Price-of-Remembering/

And Gemini is free.
The first hit is always free
The investors know it. They're not competing to own this shit like it's gonna stay free.
Real.
We’re all paying for this. In this case, the costs are only abstract, rather than the competing subscription options that are indeed quite tangible _and_ abstract.
  • scuol
  • ·
  • 13 hours ago
  • ·
  • [ - ]
Well, as with many of Google's services, you pay with your data.

Pay-as-you-go with Gemini does not snort your data for their own purposes (allegedly...).

The cost to google lying about data privacy far exceeds the profit gained from using it. Alienate your most valuable customers (enterprise) so you can get 10% more training data? And almost certainly end up in a sea of lawsuits from them?

Not happening. Investors would riot.

Indeed, the first stage of the enshittification process requires mollycoddling the customer in a convincing manner.

Looking forward to stage 2 - start serving the advertisers while placating the users, and finally stage 3 - offering it all up to the investors while playing the advertisers off each other and continuing to placate the users.

Undoubtedly, but a significant positive aspect is the democratization of this technology that enables access for people who could not afford it, not productively, that is.
I’ve yet to see any llm proselytizers acknowledge this glaring fact:

Each new release is “game changing”.

The implication being the last release y’all said was “game changing” is now “from a different century”.

Do you see it?

For this to be an accurate and true assessment means you were wrong both before and wrong now.

I'm unsure I fully understand your contention.

Are you suggesting that a rush to hyperbole which you don't like means advances in a technology aren't groundbreaking?

Or is it that if there is more than one impressive advance in a technology, any advance before the latest wasn't worthy of admiration at the time?

[dead]
I'm not an LLM proselytiser but this makes no sense? It would almost make sense if someone were claiming there are only two possible games, the old one and the new one, and never any more. Who claims that?
[dead]
  • in_ab
  • ·
  • 12 hours ago
  • ·
  • [ - ]
I asked it to make some changes to the code it wrote. But it kept pumping out the same code with more and more comments to justify itself. After the third attempt I realized I could have done it myself in less time.
I use Gemini2.5 Pro through work and it is excellent. However, I use Claude 3.7 Sonnet via API for personal use using money added to their account.

I couldn’t find a way to use Gemini like a prepaid plan. I ain’t giving my credit card to Google for an LLM that can easily charge me hundreds or thousands of EUR.

Try OpenRouter. Load up with $20 of credits and use their API for a variety of models across providers, including Gemini. I think you pay ~5% extra for the OpenRouter service.
Do you work for OpenRouter?
  • ·
  • 2 hours ago
  • ·
  • [ - ]
I’ve felt the same, but what is the equivalent of Claude code in Google’s ecosystem?

I want something running in a VM I can safely let all tools execute without human confirmation and I want to write my own tools and plug them in.

Right now a pro max subscription with Claude code plus my MCP servers seems to be the sweet spot, and a cursory look at the Google ecosystem didn’t identify anything like it. Am I overlooking something?

I think using Aider[1] with Google's models is the closest.

It's my daily driver so far. I switch between the Claude and Gemini models depending on the type of work I'm doing. When I know exactly what I want, I use Claude. When I'm experimenting and discovering, I use Gemini.

[1]: https://aider.chat/docs/llms/gemini.html

  • Eezee
  • ·
  • 5 hours ago
  • ·
  • [ - ]
I tried it out because of your comment and the very first prompt Gemini 2.5 Pro hallucinated a non-existant plugin including detailed usage instructions.

Not really my idea of good.

Can you provide your prompt? This hasn't matched my experience. You can also try enabling search grounding in the right hand bar. You have to also explicitly tell it in your prompt to use grounding with Google Search, but I've had very good success with that even for recent or niche plugins/libraries.
  • th0ma5
  • ·
  • 51 minutes ago
  • ·
  • [ - ]
So glad we're pinning the success and learning of new technology on random anecdotes. Do pro AI people not see how untenable it is where everything is a rumor?
I guess it depends on the type of tasks you give it.

They all seem to work remarkably well writing typescript or python but in my experience, they fall short when it comes to shell and more broadly dev ops

Sorry sounds like a marketing plug.
  • lifty
  • ·
  • 10 hours ago
  • ·
  • [ - ]
Excuse my ignorance, but is the good experience somehow influenced by Google AI Studio as well or only by the capability of the model itself? I know Gemini 2.5 is good, have been using it myself for a while. I still switch between Sonnet and Gemini, because I feel Claude code does some things better.
You don't worry that you can't think anymore without paying google to think for you?
OK, a better scenario than that: for some reason they cut you off. They're a huge company, they don't really care, and you would have no recourse. Many people live this story. Where once you were a programmer, if Google convinces you to eliminate your self-reliance they can then remotely turn off you being a programmer. There are other people who will use those GPU cycles to be programmers! Google will still make money.
It always is for the first week. Then you find out that the last 10% matter a lot more than than the other 90%. And finally they turn off the high compute version and you're left with a brain dead model that loses to a 32b local model half the time.
If a user eventually creates half a dozen projects with an API key for each, and prompts Gemini side-by-side under each key, and only some of the responses are consistently terrible…

Would you expect that to be Google employing cost-saving measures?

  • ·
  • 2 hours ago
  • ·
  • [ - ]
How do you use it exactly? Does it integrate with any IDEs?
Jetbrains AI recently added (beta) access to Gemini Pro 2.5 and there's of course plugins like Continue.dev that provide access to pretty much anything with an API
Copilot has it in preview. I found it looks deeper on devops tasks in the Agent mode. But context matters, you should include everything and it will push. Now I switch between Cloude and Gemini when one of them starts going circles. Gemini certainly could have more context but Copilot clearly limits it. Didn't try with Studio key though, only default settings.
Zed supports it out of the box.
Give Cline + vscode a try. Make sure to implement the "memory bank"...see Cline docs at cline.bot
Roo Code + Roo Commander + Openrouter (connecting Gemini with Vertex AI) + Context7
Just install Cursor, it supports Gemini and many other LLMs right out of the box.
Unfortunately I cannot use Cursor, not until they fix https://github.com/getcursor/cursor/issues/598.

What about Zed or something else?

I have not used any IDEs like Cursor or Zed, so I am not sure what I should be using (on Linux). I typically just get on Claude (claude.ai) or ChatGPT and do everything manually. It has worked fine for me so far, but if there is a way to reduce friction, I am willing to give it a try. I do not really need anything advanced, however. I just want to feed it the whole codebase (at times), some documentation, and then provide prompts. I mostly care about support for Claude and perhaps Gemini (would like to try it out).

without wanting to sound overly sceptical, what exactly makes you think it performs so much better compared to claude and chatgpt?

Is there any concrete example that makes it really obvious? I had no such success with it so far and i would really like to see the clear cut between the gemini and the others.

  • insin
  • ·
  • 18 hours ago
  • ·
  • [ - ]
Is it just me or did they turn off reasoning mode in free Gemini Pro this week?

It's pretty useful as long as you hold it back from writing code too early, or too generally, or sometimes at all. It's a chronic over-writer of code, too. Ignoring most of what it attempts to write and using it to explore the design space without ever getting bogged down in code and other implementation details is great though.

I've been doing something that's new to me but is going to be all over the training data (subscription service using stripe) and have often been able to pivot the planned design of different aspects before writing a single line of code because I can get all the data it already has regurgitated in the context of my particular tech stack and use case.

They rolled out a new model a week ago which has a "bug" where in long chats it forgets to emit the tokens required for the UI to detect that it's reasoning. You can remind it that it needs to emit these tokens, which helps, or accept that it will sometimes fail to do it. I don't notice a deterioration in performance because it is still reasoning (you can tell by the nature of the output), it's just that those tokens aren't in <think> tags or whatever's required by the UI to display it as such.
I think reasoning in the studio is gated by load, and at the same time I wasn't seeing so much reasoning in AIstudio, I was getting vertex service overloaded calls pretty frequently on my agents.
Remember when Microsoft started to do good things? Big corps suck when they are on top and unchallenged. It's imperative to reduce their monopolies.
  • Gud
  • ·
  • 7 hours ago
  • ·
  • [ - ]
No, I don’t.
lmao
Absolutely agree. I really pushed it last week with a screenshot of a very abstract visualisation that we’d done in a Miro board of which we couldn’t find a library that did exactly what we wanted, so we turned to Gemini.

Essentially we were hoping to tie that to data inputs and have a system to regularly output the visualisation but with dynamic values. I bet my colleague it would one shot it: it did.

What I’ve also found is that even a sloppy prompt still somehow is reading my mind on what to do, even though I’ve expressed myself poorly.

Inversely, I’ve really found myself rejecting suggestions from ChatGPT, even o4-mini-high. It’s just doing so much random crap I didn’t ask and the code is… let’s say not as “Gemini” as I’d prefer.

> Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it

Every time in the last three or four weeks, there is a post here about Gemini, the top comment, or one of the top comments is something along these lines. And every time I spend a few minutes making empirical tests to check if I made a mistake in cancelling my paid Gemini account after giving up on it...

So I just did a couple of tests sending the same prompt on some AWS related questions to Gemini Pro 2.5 (free) and Claude paid, and no, Claude still better.

Can you share the prompts?
Really? I get goofy random substitutions like sometimes from foreign languages. It also doesn't do good with my mini-tests of "can you write modern Svelte without inserting React" and "can you fix a borrow-checking issue in Rust with lifetimes, not Arc/Cell slop"

That doesn't mean it's worse than the others just not much better. I haven't found anything that worked better than o1-preview so far. How are you using it?

Is this distinct from using Gemini 2.5 Pro? If not, this doesn’t match my experience — I’ve been getting a lot of poorly designed TypeScript with an excess of very low quality comments.
The comments drive me nuts.

// Moved to foo.ts

Ok, great. That’s what git is for.

// Loop over the users array

Ya. I can read code at a CS101 level, thanks.

Are you talking about Firebase Studio?
> Today I'm not sure how I could continue programming without Google Gemini in my toolkit

Anyone else concerned about this kind of statements? Make no mistake, everyone. We are living in a LLM bubble (not an AI bubble as none of these companies are actually interested in AI as such as moving towards AGI). They are all trying to commercialise LLMs with some minor tweaks. I don't expect LLMs to make the kind of progress made by the first 3 iterations of GPT. And when the insanely hyped overvaluations crashed, the bubble WILL crash. You BETTER hope there is any money left to run this kind of tools at a profit or you will be back at Stackoverflow trying to relearn all the skills you lost using generative coding tools.

Nice try, Mr. Google.

But seriously, yeah, Gemini is pretty great.

Shhh!!! Normies will catch on and google will stop making it free.

But more seriously, they need to uncap temperature and allow more samplers if they want to really flex on their competition.

Can you explain what you mean by "uncapping" temperature and "samplers"? You can currently set temperature to whatever you want. Or do you want > 2 temp.
[dead]
I understand from a technical POV how this could be considered great news.

But I don't see how this is good news at all from a societal POV.

The last 15 or so years has seen an unprecedented rise in salaries for engineers, especially software engineers. This has brought an interest in the profession from people who would normally not have considered SW as a profession. I think this is both good and bad. It has brought new found wealth to more people, but it may have also diluted the quality of the talent pool. That said, I think it was mostly good.

Now with this game-changing efficiency from these AI tools, I'm sure we've seen an end to the glory days in terms of salaries for the SW profession.

With this gone, where else could relatively normal people achieve financial independence? Definitely not in the service industry.

Very sad.

  • a_imho
  • ·
  • 34 minutes ago
  • ·
  • [ - ]
I'm not fully bought the hype yet but actually think LLMs democratizing technical solutions would be a fantastic opportunity for both established players and newcomers. The more LLMs improve, the less of a moat technology is in itself.
I understand how, from a technical POV, electricity and electrification could be considered great news.

But I don't see how this is good news at all from a societal POV.

Think about all the lamplighters who lost their jobs. Streetlights just turn on now? Lamplighting used to be considered a stable job! And what about the ice cutters…

For real tho, it's not like there's nothing left to do — we still have potholes to fix, t-shirts to fold and illnesses to cure. Just the fact that many people continue to believe that wars are justified by resource scarcity shows we need technological progress.

how is that technological progress not fueling resource scarcity?
From what I understand, prior to the 1980s/90s lamplighters, waiters, factory workers, etc. could live comfortable lives on decent wages.

These days not so much.

From what I understand, life was Dickensian hell for many people. Communism wouldn’t have had much of a chance if everyone was pretty much able to live a decent life as a lamp lighter.
I can’t reconcile statements like this with my experience trying to code with LLMs. As soon as there’s any real complexity they spit out nonsense broken code that in some cases could take a long time to debug. Then when you correct it “You’re totally right, I’ll change it so that x y z”. If you weren’t a senior dev with loads of experience you wouldn’t be able to debug or correct the code these tools produce.
If you were a new dev now learning the ropes, with these AI coding tools available, I highly doubt you would gain the same "loads of experience".

Learning comes through struggle and it's too easy to bypass that struggle now. It's so much easier to get the answers from AI.

> Learning comes through struggle

I often find myself repeating this, although one would think it's well-known or even self-evident.

If there's no active struggle, there's no remaining knowledge, it's just fleeting information.

Aren't programmers supposed to build digital products for end users and this just makes it faster? more like POV from a person who got hired...EOD you need to think what you are doing for the bigger world and what the world can do for you because that is what your end user - your boss - is thinking. people just like to swim in their lane in their own little B2B world (i do X = i get Y) without ever stopping to think about anything except what is in front of them
  • zkry
  • ·
  • 6 hours ago
  • ·
  • [ - ]
Im curious why there's this sentiment in regarding advances in AI. High level programming languages didnt in the least bit take away the value of the SW profession, despite allowing a vast number more people to write software.

The amount and complexity of software will expand to its very outer bounds for which specialists will be required.

A better comparison I think is low-code platforms.

There are plenty of folks making a living using platforms like Salesforce and “clicks not code,” but it never led to an implosion of the SE job market. Just expanded the tech job pool. And it’s hard to imagine how that would have happened if everything needed to be coded.

Like how a growth in medical-paraprofessionals didn’t negate the need for doctors and nurses.

Sounds like there need to be measures to fix income inequality.
  • foldr
  • ·
  • 6 hours ago
  • ·
  • [ - ]
Software engineers earning enough to achieve financial independence are generally employed by FAANG or (indirectly) by venture capitalists who have more money than they know what to do with.

With all this money sloshing around, it takes only a little imagination to think of ways of channeling some of it to working people without employing them to write pointless (or in some cases actively harmful) software.

It's better for society to get much wealthier, much faster, by opening up the possibility for anyone to do advanced programming, than for a small class of anointed and studied elites to get rich via this exclusion. It's the opposite of sad. It's the best thing that ever happened for the productive use of a computer by a layperson since the invention of the search engine.
  • prmph
  • ·
  • 1 hour ago
  • ·
  • [ - ]
LLMs are not going to allow you to do advanced programming if you couldn't already do it by hand. The thing about LLMs is, they are a force multiplier, imperfectly, but I guess they are getting there. The overall vision (unless trivial), architecture, functionality-gaps-filling, revisions, etc. of an advanced project is not going to come from an LLM.

I personally don't think we are ever going to get to that point where I can give a simple propnmt and have an LLM generate a complex app ready to run. Think about what that would require:

1. The LLM would have to read my mind and extrapolate all the minute decisions I would make to implement the app based on the prompt.

2. Assuming the LLM can get past (1), it would have to basically be AGI to be able to implement pretty much whatever I can dream up.

3. If 2 & 3 above is somehow achieved, it would be economically very valuable, and you can bet that functionality is not going to be casually enabled in LLMs, for just anyone to use.

Sure, but this isn't "anyone" doing advanced programming, it's the LLM doing it. The humans get skill in using LLM, not programming, and whether this new skill will make anyone wealthy is an open question.

(Also, just by market logic, rare skills in demand are always paid more; I'm not sure why you're calling it an "exclusion". The education system in a lot of places might have that function, but that's a separate issue not helped by LLMs writing SQL?)

I contest that. A human using an LLM to program, is a human programming. Gaining skill with the LLM is gaining skill in programming. And the things most people are both able and willing to now create with LLMs are of vastly greater complexity than whatever they were doing before - so yes, it's advanced programming.

I also contest your definition of wealth. Society absolutely and obviously becomes wealthier when many more people are able to use computers for more advanced things. Just because that wealth doesn't directly appear as zeros in your financial statements doesn't mean the wealth hasn't been created.

I'm happy to accept your contest, but you should be aware that both of our opinions are only beliefs at this point and science will have an answer at some point in the future when the effects of LLMs on humans are understood better.

I do have to ask though, who do you think will pay the electricity bill for disenfranchised groups lacking wealth the most strongly to use LLMs? Some things might be free right now, but what do you think will happen when some of e.g. OpenAI's $300bn valuation is being collected?

In one of Stephen Boyd's lectures on convex optimization, he has some quip like "if your optimization problem is computationally intractable, you could try really hard to improve the algorithm, or you could just go on vacation for a few weeks and by the time you get back, computers will be fast enough to solve it."

I feel like that's actually true now with LLMs -- if some query I write doesn't get one-shotted, I don't bother with a galaxy-brain prompt; I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.

  • th0ma5
  • ·
  • 47 minutes ago
  • ·
  • [ - ]
Except here the core functionality changes day to day and hinges on specific word usage.
Has the pace of this slown down or I have just lost track of the narrative?

Feels like innovation in AI is rapidly changing from paradigm-shifting to incremental.

Try getting it to write a codepen sim of 3 rectangles parallel parking.
> I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.

1 month to write some code with LLM, that's quite the opposite of the promised productivity gain

  • danjc
  • ·
  • 11 hours ago
  • ·
  • [ - ]
The article comments "out of the box, LLMs are particularly good at tasks like creative writing" but I think this actually demonstrates the problem with the ai.

A writer won't think that they're good at creative writing. In fact, I'm pretty sure they'd think LLM's are terrible at creative writing.

In other words, to an expert in their field, they're not that good - at least not yet.

But to someone who is not an expert, they're unbelievably good - they're enabled to do something they had zero ability to do before.

Yes, but why is then everyone on HN claiming LLMs can code on expert level?
  • danjc
  • ·
  • 8 hours ago
  • ·
  • [ - ]
Fast for hammering out boilerplate, great for understanding something you've never done before. Much less value for field-frontier or novel work.
  • jeltz
  • ·
  • 3 hours ago
  • ·
  • [ - ]
Because the people claiming so are actually bad at coding. I suspect a lot of them actually work in non-coding positions. And while I can for sure see how LLMs can be useful, they code at the level of a junior dev fresh out of college, if I am being generous.
Most of the software I use feels like it is coded by a junior dev fresh out of college anyway. Slow, buggy, bloated, gobbles memory...
I would posit that most people on hackernews are actually not that experienced.
Techbro astroturfing. You don't really see the same level of OMG AI on other forums like Reddit. Same thing happened with cryptocurrencies, HN was inundated with plugs for them and the same behavior was downvoted severely elsewhere.
The game changer for me will be when AI stops hallucinating SDK methods. I often find myself asking ”show me how to do advanced concept X in somewhat niche Y sdk”, and while it produces confident answers, 90% of the time it is suggesting SDK methods that do not exist, so a lot of time is wasted just arguing about that
  • M4v3R
  • ·
  • 9 hours ago
  • ·
  • [ - ]
The current method of solving this is providing the AI with the documentation of the SDKs your code uses. Current LLMs have quite big context windows so you can feed them a lot of documentation. Some tools can even crawl multipage documentation and index them for the use of LLMs.
How do you do that practically/reliably? Would be great to just paste a link to the SDK Github repo, but doesn't seem to work (yet) in my experience
Simple way would be to use either Sonnet 3.7/5 or Gemini 2.5 pro in windsurf/cursor/aider and tell it to search the web, when you know an SDK is problematic (usually because it's new and not in the training set).

That's all it takes to get reliably excellent results. It's not perfect, but, at this point, 90% hallucinations on normal SDK usage strongly suggests poor usage of what is the current state of the art.

Where is also context7 mcp you could use, it does help sometimes
Bring LLMs they always hallucinate an API it would be great to have.

If you had something on the other side to hallucinate the API itself you could have a program that dreams itself into existence as you use it.

  • flir
  • ·
  • 8 hours ago
  • ·
  • [ - ]
"I think callTheExactMethodINeed() was a hallucination. Can you try again?"

Then it apologizes and gives the right answer. It's weird. We really need a new work for what they're doing, 'cos it ain't thinking.

Can someone please answer these questions because I still think AI stinks of a false promise of determinable accuracy:

Do you need an expert to verify if the answer from AI is correct? How is it time saved refining prompts instead of SQL? Is it typing time? How can you know the results are correct if you aren't able to do it yourself? Why should a junior (sorcerer's apprentice) be trusted in charge of using AI? No matter the domain, from art to code to business rules, you still need an expert to verify the results. Would they (and their company) be in a better place to design a solution to a problem themselves, knowing their own assumptions? Or just check of a list of happy-path results without a FULL knowledge of the underlying design? This is not just a change from hand-crafting to line-production, it's a change from deterministic problem-solving to near-enough is good enough, sold as the new truth in problem-solving. It smells wrong.

I can bring data here:

We recently did the first speed run where Louie.ai beat teams of professional cybersecurity analysts in an open competition, Splunk's annual Boss of the SOC. Think writing queries, wrangling Python, and scanning through 100+ log sources to answer frustratingly sloppy database questions:

- We get 100% correct for basic stuff in the first half that takes most people 5-15 minutes per question, and 50% correct in the second half that most people take 15-45+ minute per question, and most teams time out in that second half.

- ... Louie does a median 2-3min per question irrespective of the expected difficulty, so about 10X faster than a team of 5 (wall clock), and 30X less work (person hours). Louie isn't burnt out at the end ;-)

- This doesn't happen out-of-the-box with frontier models, including fancy reasoning ones. Likewise, letting the typical tool here burn tokens until it finds an answer would cost more than a new hire, which is why we measure as a speedrun vs deceptively uncapped auto-solve count.

- The frontier models DO have good intuition , understand many errors, and for popular languages, DO generate good text2query. We are generally happy with OpenAI for example, so it's more on how Louie and the operator uses it.

- We found we had to add in key context and strategies. You see a bit in Claude Code and Cursor, except those are quite generic, so would have failed as well. Intuitively in coding, you want to use types/lint/tests, and same but diff issues if you do database stuff. But there is a lot more, by domain, in my experience, and expecting tools to just work is unlikely, so having domain relevant patterns baked in and that you can extend is key, and so is learning loops.

A bit more on louie's speed run here: https://www.linkedin.com/posts/leo-meyerovich-09649219_genai...

This is our first attempt at the speed run. I expect Louie to improve: my answers represent the current floor, not the ceiling of where things are (dizzyingly) going. Happy to answer any other q's where data might help!

is a competition/speed run a realistic example?
Splunk Boss of the SOC is the realistic test, it is one of the best cyber ranges. Think effectively 30+ hours of tricky querying across 100+ real log source types (tables) with a variety of recorded cyber incidents - OS logs, AWS logs, alerting systems, etc. As I mentioned, the AI has to seriously look at the data too, typically several queries deep for the right answer, and a lot of rabbit holes before then - answers can't just skate by on schema. I recommend folks look at the questions and decide for themselves what this signifies. I personally gained a lot of respect for the team create the competition.

The speed run formulation for all those same questions helps measure real-world quality vs cost trade-offs. I don't find uncapped solve rates to be relevant to most scenarios. If we allowed infinite time, yes we would have scored even higher... But if our users also ran it that way, it would bankrupt them.

If anyone is in the industry, there are surprisingly few open tests here. That is another part of why we did BOTS. IMO sunlight here brings progress, and I would love to chat with others on doing more open benchmarks!

My 2 cents, building a tool in this space...

> Do you need an expert to verify if the answer from AI is correct?

If the underling data has a quality issue that is not obvious to a human, the AI will miss it too. Otherwise, the AI will correct it for you. But I would argue that it's highly probable that your expert would have missed it too... So, no, it's not a silver bullet yet, and the AI model often lacks enough context that humans have, and the capacity to take a step back.

> How is it time saved refining prompts instead of SQL?

I wouldn't call that "prompting". It's just a chat. I'm at least ~10x faster (for reasonable complex & interesting queries).

Same reason as why it's harder to solve a sudoku than it is to verify its correctness.
I should have made my post clearer :)

There isn't one perfect solution to SQL queries against complex systems.

A suduko has one solution.

A reasonably well-optimised SQL solution is what the good use of SQL tries to achieve. And it can be the difference between a total lock-up and a fast running of a script that keeps the rest of a complex system from falling over.

The number of solutions doesn't matter though. You can easily design a sudoku game that has multiple solutions, but it's still easier to verify a given solution than to solve it from scratch.

It's not even about whether or not the number of solutions is limited. A math problem can have unlimited amount of proofs (if we allow arbitrarily long proofs), but it's still easier to verify one than to come up with one.

Of course writing SQL isn't necessarily comparable to sudoku. But the difference, in the context of verifiability, is definitely not "SQL has no single solution."

If the current state of software is any indication, experts don't care much about optimization either.
  • neRok
  • ·
  • 2 hours ago
  • ·
  • [ - ]
I've recently started asking the free version of chat-gpt questions on how I might do various things, and it's working great for me - but also my questions come from a POV of having existing "domain knowledge".

So for example, I was mucking around with ffmpeg and mkv files, and instead of searching for the answer to my thought-bubble (which I doubt would have been "quick" or "productive" on google), I straight up asked it what I wanted to know;

  > are there any features for mkv files like what ffmpeg does when making mp4 files with the option `--movflags faststart`?
And it gave me a great answer!

  (...the answer happened to be based upon our prior conversation of av1 encoding, and so it told me about increasing the I-frame frequency).
Another example from today - I was trying to build mp4v2 but ran in to drama because I don't want to take the easy road and install all the programs needed to "build" (I've taken to doing my hobby-coding as if I'm on a corporate-PC without admin rights (windows)). I also don't know about "cmake" and stuff, but I went and downloaded the portable zip and moved the exe to my `%user-path%/tools/` folder, but it gave an error. I did a quick search, but the google results were grim, so I went to chat-gpt. I said;

  > I'm trying to build this project off github, but I don't have cmake installed because I can't, so I'm using a portable version. It's giving me this error though: [*error*]
And the aforementioned error was pretty generic, but chat-gpt still gave a fantastic response along the lines of;

  >  Ok, first off, you must not have all the files that cmake.exe needs in the same folder, so to fix do ..[stuff, including explicit powershell commands to set PATH variables, as I had told it I was using powershell before].
  >  And once cmake is fixed, you still need [this and that].
  >  For [this], and because you want portable, here's how to setup Ninja [...]
  >  For [that], and even though you said you dont want to install things, you might consider ..[MSVC instructions].
  >  If not, you can ..[mingw-w64 instructions].
  • neRok
  • ·
  • 1 hour ago
  • ·
  • [ - ]
[Going to give myself a self-reply here, but what-ev's. This is how I talk to chat-gpt, FYI]... So I happened to be shopping for a cheap used car recently, and we have these ~15 year old Ford SUV's in Aus that are comfortable, but heavy and thirsty. Also, they come in AWD and RWD versions. So I had a thought bubble about using an AWD "gearbox" in a RWD vehicle whilst connecting an electric motor to the AWD front "output", so that it could work as an assist. Here was my first question to chat-gpt about it;

  > I'm wondering if it would be beneficial to add an electric-assist motor to an existing petrol vehicle. There are some 2010 era SUV's that have relatively uneconomical petrol engines, which may be good candidates. That is because some of them are RWD, whilst some are AWD. The AWD gearbox and transfer case could be fitted to the RWD, leaving the transfers front "output" unconnected. Could an electric motor then be connected to this shaft, hence making it an input?
It gave a decent answer, but it was focused on the "front diff" and "front driveshaft" and stuff like that. It hadn't quite grasped what I was implying, although it knew what it was talking about! It brought up various things that I knew were relevant (the "domain knowledge" aspect), so I brought some of those things in my reply (like about the viscous coupling and torque split);

  > I mentioned the AWD gearbox+transfer into a RWD-only vehicle, thus keeping it RWD only. Thus both petrol+electric would be "driving" at the same time, but I imagine the electric would reduce the effort required from the petrol. The transfer case is a simple "differential" type, without any control or viscous couplings or anything - just simple gear ratio differences that normally torque-split 35% to the front and 65% to the rear. So I imagine the open-differential would handle the 2 different input speeds could "combine" to 1 output?
That was enough to "fix" its answer (see below). And IMO, it was a good answer!

I'm posting this because I read a thread on here yesterday/2-days-ago about people stuggling with their AI's context/conversation getting "poisoned" (their word). So whilst I don't use AI that much, I also haven't had issue with it, and maybe that's because of that way I converse with it?

---------

"Edit": Well, the conversation was too long for HN, so I put it here - https://gist.github.com/neRok00/53e97988e1a3e41f3a688a75fe3b...

  • ·
  • 3 hours ago
  • ·
  • [ - ]
the short answer: use a semantic layer.

It's the cleanest way to give the right context and the best place to pull a human in the loop.

A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.

With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.

We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.

My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.

0 - https://www.definite.app/

1 - https://cube.dev/

2 - https://news.ycombinator.com/item?id=25930190

  • ljm
  • ·
  • 19 hours ago
  • ·
  • [ - ]
> you get the added benefit of writing queries in JSON instead of raw SQL.

I’m sorry, I can’t. The tail is wagging the dog.

dang, can you delete my account and scrub my history? I’m serious.

You move all the tools to debug and inspect slow queries, in a completely unsupported JSON environment, with prompts not to make up column names. And this is progress?
  • ·
  • 18 hours ago
  • ·
  • [ - ]
The JSON compiles to SQL. Have you used a semantic layer? You might have a different opinion if you tried one.
As someone who actually wrote a JSON to (limited) SQL transpiler at $DAYJOB, as much fun as I had designing and implementing that thing and for as many problems it solved immediately, 'tail wagging the dog' is the perfect description.
LLMs are far more reliable at producing something like this:

    {
      "dimensions": [
        "users.state",
        "users.city",
        "orders.status"
      ],
      "measures": [
        "orders.count"
      ],
      "filters": [
        {
          "member": "users.state",
          "operator": "notEquals",
          "values": ["us-wa"]
        }
      ],
      "timeDimensions": [
        {
          "dimension": "orders.created_at",
          "dateRange": ["2020-01-01", "2021-01-01"]
        }
      ],
      "limit": 10
    }

than this:

    SELECT
      users.state,
      users.city,
      orders.status,
      sum(orders.count)
    FROM orders
    CROSS JOIN users
    WHERE
      users.state != 'us-wa'
      AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
    GROUP BY 1, 2, 3
    LIMIT 10;
This doesn't make sense.

From a schema standpoint, table `orders` presumably has a row per order, with columns like `user_id`, `status` (as you stated), `created_at` (same), etc. Why would there be a `count` column? What does that represent?

From a query standpoint, I'm not sure what this would accomplish. You want the cartesian product of `users` and `orders`, filtered to all states except Washington, and where the order was created in 2020? The only reason I can think of to use a CROSS JOIN would be if there is no logical link between the tables, but that doesn't make any sense for this, because users:orders should be a 1:M relationship. Orders don't place themselves.

I think what you might have meant would be:

    SELECT
      users.state,
      users.city,
      orders.status,
      COUNT(*)
    FROM users
    JOIN orders ON user.id = orders.user_id
    WHERE
      users.state != 'us-wa' AND
      orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
    GROUP BY 1, 2, 3
    LIMIT 10;
Though without an ORDER BY, this has no significant meaning, and is a random sampling at best.

Also, if you or anyone else is creating a schema like this, _please_ don't make this denormalized mess. `orders.status` is going to be extremely low cardinality, as is `users.state` (to a lesser extent), and `users.city` (to an even lesser extent, but still). Make separate lookup tables for `city` and/or `state` (you don't even need to worry about pre-populating these, you can use GeoNames[0]). For `status`, you could do the same, or turn them into native ENUM [1] if you'd like to save a lookup.

[0]: https://www.geonames.org

[1]: https://www.postgresql.org/docs/current/datatype-enum.html

The programming languages are more predictable than human. So the rules are much easier to be "compressed" after they're basically detected when fed with big data. Your two examples imho are easily interchangeable during follow-up conversation with a decent LLM. Tested this with the following prompt and fed a c fragment and an SQL-fragment, got in both cases something like your first one

> Please convert the following fragment of a programming language (auto-detect) into a json-like parsing information when language construct is represented like an object, fixed branches are represented like properties and iterative clauses (statement list for example) as array.

This may be the best comment on Hacker News ever.
I think that honor still belongs to "Did you win the Putnam?" [0] but this is definitely still in the top 5.

[0]: https://news.ycombinator.com/item?id=35079

You're right, it's a bit ridiculous. This is a perfect time to use xml instead of json.
Clearly the right solution is to use XML Object Notation, aka XON™!

JSON:

  {"foo": ["bar", 42]}
XON:

  <Object>
    <Property>
      <Key>foo</Key>
      <Value>
        <Array>
          <String>bar</String>
          <Number>42</Number>
        </Arra>
      </Value>
    </Property>
  </Object>
It gives you all the flexibility of JSON with the mature tooling of XML!

Edit: jesus christ, it actually exists https://sevenval.gitbook.io/flat/reference/templating/oxn

We had an IT guy who once bought an XML<->JSON server for $12,000. Very proud of his rack of "data appliances". It made XML like XON out of JSON and JSON that was a soup of elements attributes and ___content___, thus giving you the complexity of XML in JSON. I don't think it got used once by our dev team, and I'm pretty sure it never processed a byte of anything of value.
[flagged]
[flagged]
Mother of God. I can write JSON instead of a language designed for querying. What is the advantage? If I’m going to move up an abstraction layer, why not give me natural language? Lots of things turn a limited natural language grammar into SQL for you. What is JSON going to {do: for: {me}}?
Sorry, I couldn't parse that. You didn't quote your keys
I find it funny people are making fun of this while every ORM builds up an object representing the query and then compiles it to SQL. SQL but as a data structure you can manipulate has thousands of implementations because it solves a real problem. This time it's because LLMs have an easier time outputting complex JSON than SQL itself.
Any idea why that is?
>you get the added benefit of writing queries in JSON instead of raw SQL

^ kids, this is what AI-induced brainrot looks like.

A semantic layer would be great. It should be a structured layer designed to make relational queries easy to write. We could call it “structured data language” or maybe “structured query language”.

In all seriousness, I have some complaints about SQL (I think LINQ’s reordering of it is a good idea), but there’s no need to invent another layer on order for LLMs to be able to wrangle it.

The semantic layer for database queries is (roughly) the relational algebra.
>you get the added benefit of writing queries in JSON instead of raw SQL

You should have written your comment in JSON instead of raw English.

I agree that using a semantic layer is the best way to get better precision. It is almost like a cheatsheet for the AI.

But I would never use one that forced me to express my queries in JSON. The best implementations integrate right into the database so they become an integral part of regular your SQL queries, and as such also available to all your tools.

In my experience, from using the Exasol Semantic Layer, it can be a totally seamless experience.

still need someone to build the semantic layer, why not use text2sql or something similar for that
  • insin
  • ·
  • 15 hours ago
  • ·
  • [ - ]
Is it too late to rescue the phrase "one-shotted" or is it already too far gone, like "AI" and "agent"?
For some reason I can't get the image of someone swinging back shots of vodka/tequila every time I see "one-shotted" out of my head
Reminds me the "crypto" name overloading. It is clear that fanboys are jealous of competence.
Google may be getting AI to write good SQL, but they aren’t getting it to write good blog posts.
The blog post lacks lots of details and sounds more of a marketing piece and “Try this!”. They did not release the evals, a very basic architecture flow which is not novel nor any real world benchmarks that says how it worked expect some vague statements. Must have been generated by Gemini
LLMs are still not great at generating SQL. If Google had a breakthrough, it should be on the bird brain benchmark - (A Big Bench for Large-Scale Database Grounded Text-to-SQLs) https://bird-bench.github.io/

At the moment GCP are at 76%, humans are at 93%.

Nice! A little off topic but I spent years experimenting writing AI-like natural language wrappers for relational databases that would query meta data to get column names, etc. Peter Norvig, in doing a tech review for me for the second edition of my Java AI book made a comment that the NLP database example was much better than anything else in the book, so the code I sweated over off and on for years was probably pretty good, BUT!, compared to what you can build with LLMs today, my old NLP wrappers aren't good at all.

LLMs make some things that were difficult very easy now.

Good article!

For the problems where it would matter the most, these tools seem to help the least. The hardest problem domains don't have just one schema to worry about. They have hundreds. If you need to spin up a personal blog or todo list tracker, I have no doubt that Google, et. al. can take you exactly where you want to go.
and then add in ambiguity in the business terms / intention behind the query. still a big need for something like semantic layer or ontology to sit between business and at least right now that stuff hasn’t been automated away yet (it should be though)
Malloy [1] has a semantic layer [2]... and Model Context Protocol (MCP) support is being added through Publisher [3]. Something to keep an eye on. Seems like a great fit for LLMs.

[1] https://www.malloydata.dev/ [2] https://docs.malloydata.dev/documentation/user_guides/malloy... [3] https://github.com/malloydata/publisher

“Metrics: We combine user metrics and offline eval metrics, and employ both human and automated evaluation, particularly using LLM-as-a-judge techniques”.

I’m curious to know what people are doing to measure whether the customer got what they were looking for. Thumbs up/down seems insufficient to me.

The ability of the LLM to perform purely depends on having good knowledge of what is going to get asked and how, which is more complex than it sounds

What techniques are people having success with?

Training a 2nd agent as a qualitative evaluator works pretty well "LLM-as-a-judge". You train it with labeled critiques from experts, iterate a few times, then point it to your ground truth human-labelled-data ("golden dataset"). The quantitative output metric is human2ai alignment on the golden dataset, mix that with some expert judgment about the critique output by the ai as well.

Works pretty well for me, where you can typically get within the range of human2human variance.

Given that their first query example has a leading wildcard as a predicate (WHERE p.product_name LIKE '%shoe%') and doesn't take case into account, I have doubts.
Yeah there's still a long way to go. Until these things actually try to consistently spit out SARGable queries, look at the query plans, check for covering indexes, etc. they're going to write worse queries than an entry level data engineer.

I'm certain they'll get there soon, they're just not there yet.

> We will cover state-of-the-art [...] how we approach techniques that allows the system to offer virtually certified correct answers.

I don't need AI to generate perfect SQL, because I am never going to trust the output enough to copy/paste it — the risk of subtle semantic errors is too high, even if the code validates.

Instead, I find it helpful for AI to suggest approaches — after which I will manually craft the SQL, starting from scratch.

Explain that to the average manager or junior engineer, both who don’t care about your desire to build well but not fast.
> So now that we brought down prod for a day the new rule is no AI sql without three humans signing off on any queries.
  • Closi
  • ·
  • 15 hours ago
  • ·
  • [ - ]
If that’s the scenario, I would be asking why the testing pipeline didn’t catch this rather than why was the AI SQL wrong.
To offer a 3rd option - what testing pipeline? Incompetent managers aren't going to approve of developers "wasting their time" on writing high quality tests.
Because the testing pipeline isn't the real database.

Anyone that knows a database well can bring it down with a innocent looking statement that no one else will blink at.

Because the testing pipeline was generated by AI, and code-reviewed by AI, reading a PR description generated by AI.
It’s not true that I want to build “well but not fast” — I’m trying to add value, and both speed and reliability matter. My productivity is high and I don’t have trouble articulating why; my approach has generally (though not universally) been well received by management and colleagues.
  • hosel
  • ·
  • 16 hours ago
  • ·
  • [ - ]
Really? In my experience it’s been pretty good (using Pydantic)! I read over before I execute it, but it’s never done anything malicious.
I don't trust myself to craft a prompt in natural language which completely specifies my intent as codified with the precision of a programming language.

I also tend to turn to AI for advising me on difficult use cases, and most of the time it's for production code rather than one-offs. The easy cases, I just write myself because it's more mental effort to review code for subtle errors than it is to write it.

What is the relevance of Pydantic with SQL?
  • ·
  • 16 hours ago
  • ·
  • [ - ]
  • ·
  • 16 hours ago
  • ·
  • [ - ]
Hopefully your trust in yourself is warranted
I embrace my fallibility, and enthusiastically pursue testing, code reviews, staging environments, and so on to minimize the mistakes that make it through to production.

It seems to me that this skeptical mindset is consonant with handling AI output with care.

You'd rather trust in AI than yourself?
in writing good sql code? i definitely would

ai is not going to replace the senior sql expert with 20 years of battle experience in the short-term but support me who last dug into sql 15 years ago and needs to get a working sql query in a project. and ai usually does a better job than me copy pasting googled code in between quickly browsing through tutorials.

What’s the eventual goal of text to sql?

Is it to build a copilot for a data analyst or to get business insight without going through an analyst?

If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.

These don’t seem like text2sql problems:

> Why did we hit only 80% of our daily ecommmerce transaction yesterday?

> Why is customer acquisition cost trending up?

> Why was the campaign in NYC worse than the same in SF?

> These don’t seem like text2sql problems:

Correct, but I would propose two things to add to your analysis:

1. Natural language text is a universal input to LLM systems

2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions

And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.

There's clearly much work needed to achieve that goal.

yeah I agree with this - good text2sql is essential but just one part of a larger stack that will actually get there. Seems possible tho
To be fair, these don’t look like SQL problems either. SQL answers “what”, not “why” questions. The goal of text2sql is to free up analyst time to get through “what” much faster and - possibly- focus on “why” questions.
My observation is the latter, but I agree the results fall short of expectations. Business will often want last minute change in reporting, don't get what they want at the right time because lack of analysts, and hope having "infinite speed" will solve the problem.

But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.

Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.

Any algo that a human would follow can be built and tested. If you have 10 analysts you have 10 different skill levels, with differing understanding of the database and business context. So automation gives you a platform to achieve a floor of skill and knowledge. The humans can now be “at least this good or better”. A new analyst instantly gets better, faster.

I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.

Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.

But “text to sql” isn’t an algorithm.
The processes the people want the sql for are likely filled with algo’s. An exec wants info in a known domain, set up a text to sql system with lots of context and testing to generate queries. If they think they have something good, get an expert to test and productionise it.

“Thank you for your request. Can you walk me through the steps you’d use to do this manually? What things would you watch out for? What kind of number ranges are reasonable? I can propose an algorithm and you tell me if that’s correct. The admins have set up guidelines on how to reason about customer and purchase data. Is the following consistent with your expectations?”

This is the same fallacy as low-code/no-code. If you have to check a precise algorithm, you’re effectively coding, and you need a language with the same precision as a programming language.
Only if you want a production-ready output. To get execs able to self-feed enough, this works fine. Look, you don’t see value until it’s perfect. Good, other people do. I see your fallacy and raise you a false dichotomy.
The problem I see is how do you verify that the result of your text-to-sql is really what you were asking for, without understanding the SQL (or “the algorithm”)? It boils down to that you have to know what you are doing, and with the present state of art of AI we can’t have confidence in that.
  • ·
  • 16 hours ago
  • ·
  • [ - ]
I wonder if, for a given dialect (and even DDL), you could use that token masking technique similar to how that Structured Outputs [1] thing went:

Quote: "While sampling, after every token, our inference engine will determine which tokens are valid to be produced next based on the previously generated tokens and the rules within the grammar that indicate which tokens are valid next. We then use this list of tokens to mask the next sampling step, which effectively lowers the probability of invalid tokens to 0. Because we have preprocessed the schema, we can use a cached data structure to do this efficiently, with minimal latency overhead."

I.e. mask any tokens that would produce something that isn't valid SQL in the given dialect, or further, a valid query for the given schema. I assume some structured outputs capability is latent to most assistants nowadays, so they probably already have explored this

[1] https://openai.com/index/introducing-structured-outputs-in-t...

  • zeroq
  • ·
  • 14 hours ago
  • ·
  • [ - ]
Every once in a while I've been trying AI, since everyone and their mother told me to, so I comply.

My recent endevour was with Gemini 2.5:

  - Write me a simple todo app on cloudflare with auth0 authentication.
  - Here's a simple todo on cloudflare. We import the @auth0-cloudflare and...
  - Does that @auth0-cloudflare exists?
  - Oh, it doesn't. I can give you a walkthrough on how to set up an account on auth0. Would you like me to?
  - Yes, please.
  - Here. I'm going to write the walkthrough in a document... (proceed to create an empty document)
  - That seems to be an empty document.
  - Oh, my bad. I'll produce it once more. (proceed to create another empty document)
  - Seems like you're md parsing library is broken, can you write it in chat instead?
  - Yes... (your gemini trial has expired, would you like to pay $100 to continue?)
It's difficult to assess how typical your experience is; I tried your initial prompt (`Write me a simple todo app on cloudflare with auth0 authentication.` on gemini-2.5-pro-preview-05-06) and didn't get any mentions of @auth0-cloudfare, although I cannot verify if the answer is working as-is

https://pastebin.com/yfg0Zn0u

Shocked you got a different output from the stochastic token generator.
That's not the point. While there is a temperature setting and randomness involved, you can still benchmark and experience significant differences in the output between models and generations. I thus provided more details and the full output to make it easier for people to assess the context of the comment I replied to

When someone uses the same tools as I do but seem to experience problems I do not have - these kind of posts often describes how bad LLMs are or how bad Google search is - I get a bit confused. Is it A/B testing going on? Am I just lucky? Am I inattentive to these weaknesses? Is it about promoting? Or what areas we work in? Do we actually use the same tools (i.e., same models)?

The worse part is not even being trolled at AI roundabout. The worse part is gaslighting by people who then go on to imply that I'm dumb to not be able to 'guide' the model 'towards the solution', whatever the fuck that means. And this is after telling me that model is so smart to just know what I want.

Claude and Gemini are pretty decent at providing a small and tight function definition with well defined parameters and output, but anything big and it starts losing shit left and right.

All vibecoding sessions I've seen have been pretty dead easy stuff with lot of boilerplate, maybe I'm weird for just not writing a lot of boilerplate and rely on well-built expressive abstractions..

Remember, if AI couldn't solve your problem, you were probably using the wrong model. Did you try with o5-selfsuck-20250523-512B?
My least favorite part of this trend is the ageism. "Crusty curmudgeons are not up-to date with the latest bloat if they think RTFM is still a thing", "Oh, you didn't like ORMs? Did you try letting an AI generate code for your ORM?"

Maybe in the future all of these assistants will offer something amazing, but in my experience, there is more time invested in prompting that just reading the relevant documentation and having a coherent design.

My suspicion is that many, (but not all please no flames) of the biggest boosters of AI coding are simply inexperienced. If this is true, it makes sense that they wouldn't recognize the numerous foot-guns in AI generated code.

As an experienced coder, I find ai invaluable for a ton of stuff, nearly none of it writing production code.

* Variable naming

* Summarizing unfamiliar code

* Producing boilerplate code when I have examples

* Producing one-liners when I've forgotten the parameter order or API specification. I double check, but this is basically a Google that directly answers your question

* Pre-code brainstorming

* Code review. Depending on the language it can catch classes of problems that escape linters

In my experience it won't produce production-ready code, but it's great as a rubber duck and a second pair of eyes.

  • kubb
  • ·
  • 11 hours ago
  • ·
  • [ - ]
It’s brilliant because you can always shift the blame on the user. Wrong prompt, wrong model, should have used an agent and ran 3 models in parallel, etc.

Meanwhile we get claims that the tools are as capable as a junior programmer, and CEOs believe that.

Yeah it's my favourite argument. Apparently this magical tool that can replace engineers and can do and write anything needs you to write prompts so detailed that you could have just written the damn code yourself, and probably had an easier time with it to boot.

The whole thing feels like we're in a collective delusion because idiotic managers and C-suites are blindly lapping up the advertising slop coming from the AI companies.

  • Kiro
  • ·
  • 10 hours ago
  • ·
  • [ - ]
You're the one doing the gaslighting now. "It doesn't work for me, therefore it can't possibly work for anyone else."
At this point you should just take this as your secret weapon. Let people convince each other that AI can't do that thing, while you are one-shotting the exact thing with a cost of $0.05.
  • codr7
  • ·
  • 6 hours ago
  • ·
  • [ - ]
Which is a very reasonable conclusion given the kinds of errors it makes.

Why are you so defensive about the tech?

Involved in any AI startups, perhaps?

  • rvz
  • ·
  • 4 hours ago
  • ·
  • [ - ]
Most of them are involved (including the latest round of YC startups), who have VCs invested in them to boost "AI agents" all over the internet and ignoring the laughable errors it makes.

Here are some headlines:

OpenAI explains why ChatGPT became too sycophantic [0]

Anthropic blames Claude AI for ‘embarrassing and unintentional mistake’ in legal filing [1]

xAI blames Grok’s obsession with white genocide on an ‘unauthorized modification’ [2]

So you see, these AI models can easily be prone to producing nonsensical errors unpredictably at any time.

[0] https://techcrunch.com/2025/04/29/openai-explains-why-chatgp...

[1] https://www.theverge.com/news/668315/anthropic-claude-legal-...

[2] https://techcrunch.com/2025/05/15/xai-blames-groks-obsession...

  • zxexz
  • ·
  • 16 hours ago
  • ·
  • [ - ]
I find Gemini excellent for sql. Wouldn’t consider myself an expert in many things, but in sql and database design id consider myself close. I like writing queries and doing the architecture, and that’s where it’s exceptionally helpful. The massive context length combined with pointed questions means i can just dump the entire DDL, and ask “what am i missing?”. It really is an excellent tool for helping with times like checks and catching dumb errors on complex databases.
A question: Does anyone know how well AI does generating performative SQL in years-old production databases? In terms of speed of execution, locking, accuracy, etc.?

I see the promise for green-field projects.

It's very hit or miss. Claude does OK-ish, others less so. You have to explicitly state the DB and version, otherwise it will assume you have access to functions / features that may not exist. Even then, they'll often make subtle mistakes that you're not going to catch unless you already have good knowledge of your RDBMS. For example, at my work we're currently doing query review, and devs have created an AI recommendation script to aid in this. It recommended that we create a composite index on something like `(user_id, id)` for a query. We have MySQL. If you don't know (the AI didn't, clearly), MySQL implicitly has a copy of the PK in every secondary index, so while it would quite happily make that index for you, it would end up being `(user_id, id, id)` and would thus be 3x the size it needed to be.
Regarding the first issue: ” For example, even the best DBA in the world would not be able to write an accurate query to track shoe sales if they didn't know that cat_id2 = 'Footwear' in a pcat_extension table means that the product in question is a kind of shoe. The same is true for LLMs.

I wish developers would make use of long table names and column names. For example, pcat_extension could have been named release_schema_1_0.product_category_extension. And cat_id2 could have been named category_id2.

Wait... people need AI to write SQL ?
A junior in SQL would need AI to write things they're not sure about, the same way stackoverflow has helped us for many many years before AI. A senior in sql, and in fact any languages, would use AI to be accelerated (I know I do).
I see this comparison too often and I don't think it's fair. Stackoverflow has peer review.
no.but if I ask for a report in natural language, the AI needs to be able to write sql.
Most people here have not understood the relational model, so yes.
I’ve read most of Codd’s book on the subject and have written SQL on and off since the mid 90’s and I still need to look up the differences between the various joins anytime I use them.
No mention of knowing anything about the tables, versions or relational structure? Are we just assuming that's already given to the AI?
This is on howto to to write good SELECTS, not SQL. AI is good enough to also create schemas from spec, migrate, explore databases, testing etc which tgis article does not touch upon
Every time I've fed more than 5 migration files and asked Claude to make multiple across those files it fails, it does very badly in almost all cases, even on kinda basic schemas. I actually don't think LLMs grok complex migration files or sql that well at all.
Well that's a great startup idea if you're familiar with the domain.
Those days, we have many types of database tools—ORMs, query builders, and more. AI can help reduce the complexity and avoid lock-in to a specific tech stack. I love to write raw SQL.
Problem no. 2 (Understanding user intent) is relevant not only to writing SQL but also to software development in general. Follow-up questions are something I had in mind for a long time. I wonder why this is not the default for LLMs.
  • iddan
  • ·
  • 10 hours ago
  • ·
  • [ - ]
For anybody wanting to use best-in-class AI SQL, I highly recommend checking out Sherloq (W23): https://www.sherloqdata.io/
I have done this using the OpenAI 4o model. I had to pass in a prompt with business-specific instructions, industry jargon, and descriptions of tables, including foreign keys. Then it would generate even complex join queries and return data. In my case, I was more interested in providing results to users not knowledgeable about SQL, but the SQL was displayed for information.
In real life I find using AI for SQL dangerous. It allows people that don't know what they do to write queries that can significantly impact servers. In my world databases are relatively big for most developers, but not huge.

Sometimes when I want to fine tune a query I am challenging AI to provide a better solution. I give it the already optimized query and I ask for better. I never got a better answer, sometimes because AI is hallucinating or because the changes that it proposes are not working in a way that is beneficial, it is like an idiot parrot is telling what it overheard in the brothel - good info if it is a war brothel frequented by enemy officers in 1916, but not these days.

It should never be at the point where some random person can impact a server.

That's what read replicas with read-only access are for. Production db servers should not be open to random queries and usage by people. That's only for the app to use.

Unless you have a much more regimented code review process than anywhere I've seen, "a random person" can impact prod quite easily by introducing a bad query into the app. Since ORMs are rampant, it's probably heavily obfuscated to begin with, so they won't even see the raw SQL. At best, they'll have run it on stage, where the DB size is probably so tiny that its performance issues go unnoticed.
How it should be and how it is, that depends on who is the decision maker. If the decision maker is a technical person, there is no gap, but in my case the decision maker is a non-technical manager with no competence to make such decisions, but that is the way the company is organized. So letting people use AI to dig through a 1 TB database is not a good idea, while not using AI prevents them to even try. Security by oblivion.
> I give it the already optimized query and I ask for better. I never got a better answer..

This was my experience as well. However I have observed that things have been improving this regard. Newer LLMs do perform much better. And I suspect they will continue to get better over time.

I’ve been working on highly optimized code that heavily uses CPU intrinsics, a year ago no chance, 6 months ago a helpful reference, today it’s a good starting point. That is an insane pace of improvement.
The strategy I've used with these people is to let them prototype with AI and then have them hand over their work to me where I can then make it significantly more efficient. The nice thing is that their poor performing version acts as a reference for validating the output of my queries.
Mate, IME programmers who don't know what they are doing just do it anyways then look to blame someone/something else if things turn to custard.

AI is just increasing the frequency of things turning to custard :)

AI is most effective as an accountability sink
> It allows people that don't know what they do to write queries that can significantly impact servers.

At least for the only OLAP DB I use often - Amazon Redshift - that’s a solved problem with Workload Management Queues. You can restrict those users ability to consume too many resources.

For queries that are used for OLTP, I usually try to keep those queries relatively simple. If there is a reason for read queries that consume resources , those go to read replicas when strong consistently isn’t required

If a lot of the value in a company is the software and over time a handful of AI companies start writing all the software, who really ends up owning all the value of the company?
That’s easy. None of the value is in the software. The only value is in customers that use the software.
Is it me or is the grammar of this article really poor:

> If the user is a technical analyst or a developer asking a vague question, giving them a reasonable, but perhaps not 100% correct SQL query is a good starting point

> Out of the box, LLMs are particularly good at tasks like creative writing, summarizing or extracting information from documents.

I don't -think- this was written by an LLM, but it really pulls me out of the technical article.

Out of all the AI tools and models I’ve tried, the most disappointing is the Gemini built into BigQuery. Despite having well named columns with good descriptions it consistently gets nowhere close to solving the problem.
Having written more SQL than any other programming language by now, every time I've tried to use AI to write the query for me, I'd spend way more time getting the output right than if I'd just written it myself.

As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like

    credit.CLIENT->NAME
And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad.
I'd like there to be a function or macro for a bunch of joins, say

    DEFINE products_by_order AS orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id
You could make it visible to the DB rather than just a macro so it could optimise it by caching etc. Sort of like a view but on demand.
Sounds like a CTE?
That's one of the features of EdgeQL:

    select Movie {
      id,
      title,
      actors: {
        name
      }
    };

https://docs.geldata.com/learn/edgeql#select-objects

Although I think good enough language server / IDE could automatically insert the join when you typed `credit.CLIENT->NAME`

(Shameless plug) writing the same joins over and over (and refactoring when you update stuff) was one of my biggest boilerplate annoyances with SQL - I’ve tried to fix that while still keeping the rest of SQL in https://trilogydata.dev/
yeah we’re doing something similar under the hood at AstroBee. it’s way way way easier to handle joins this way.

imo any hope of really leveraging llms in this context needs this + human review on additions to a shared ontology/semantic layer so most of the nuanced stuff is expressed simply and reviewed by engineering before business goes wild with it

Having proper constraints and foreign keys that are clear is generally all that's needed in my experience. Are you sure your tables have well defined constraints, so that the AI can be absolutely 100% sure how everything links up? SQL is very precise, but only if you're utilizing constraints and foreign key definitions well.
It’s BigQuery, so it likely won’t have any of these.
BigQuery supports all those SQL things I mentioned.
I’m just saying it’s likely they aren’t using them. But clearly you should if you want LLMs to do anything useful.
o3 has yet to fail me on complex, multi-table queries. Not a fan of BigQuery’s Gemini integration.
For me the flash model is way better than the pro model. I don't want to wait all the extra time to get some code back that I'm going to have to read and modify anyway. I much prefer getting a 92.5% right answer now, than 95% correct answer a minute or minutes from now.
Are those percentages real, or plucked out of the air? I find Pro's quality starkly higher
AI text to regex solutions would be incredibly handy.
This comment appears frequently and always surprises me. Do people just... not know regex? It seems so foreign to me.

It's not like it's some obscure thing, it's absolutely ubiquitous.

Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.

It takes far longer to tell an AI what you want than to write a regex yourself.

"It takes far longer to tell an AI what you want than to write a regex yourself."

My experience is the exact opposite. Writing anything but the simplest regex by hand still takes me significant time, and I've been using them for decades.

Getting an LLM to spit out a regex is so much less work. Especially since an LLM already knows the details of the different potential dialects of regex.

I use them to write regexes in PostgreSQL, Python, JavaScript, ripgrep... they've turned writing a regex from something I expect to involve a bunch of documentation diving to something I'll do on a whim.

Here's a recent example - my prompt included a copy of a PostgreSQL schema and these instructions:

  Write me a SQL query to extract
  all of my images and their alt
  tags using regular expressions.
  In HTML documents it should look
  for either <img .* src="..." .*
  alt="..." or <img alt="..." .*
  src="..." (images may be self-
  closing XHTML style in some 
  places). In Markdown they will
  always be ![alt text](url)
I ended up with 100 lines of SQL: https://gist.github.com/simonw/5b44a662354e124e33cc1d4704cdb...

The markdown portion of that is a good example of the kind of regex I don't enjoy writing by hand, due to the need to remember exactly which characters to escape and how:

  (REGEXP_MATCHES(commentary,
  '!\[([^\]]*)\]\(([^)]*)\)', 'g'))[2] AS src,
  (REGEXP_MATCHES(commentary,
  '!\[([^\]]*)\]\(([^)]*)\)', 'g'))[1] AS alt_text
Full prompt and notes here: https://simonwillison.net/2025/Apr/28/dashboard-alt-text/
Perhaps Perl has given me Stockholm Syndrome, but when I look at your escaped regex example, it's extremely natural for me. In fact, I'd say it's a little too simple, because the LLM forgot to exclude unnecessary whitespace:

  (REGEXP_MATCHES(commentary,
  '!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[2] AS src,
  (REGEXP_MATCHES(commentary,
  '!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[1] AS alt_text
That is just nitpicking a one-off example though, I understand your wider point.

I appreciate the LLM is useful for problems outside one's usual scope of comfort. I'm mainly saying that I think it's a skill where the "time economics" really are in favor of learning it and expanding your scope. As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly. Certainly not the case for all skills, but I truly believe regex is one of them due to its small scope and ubiquitous application. The LLM can be used for the remaining 10% of really complicated cases.

As you've been using regex for decades, there is already a large subset of problems where you're faster than the LLM. So that problem space exists, it's all about how to tune learning time to right-size it for the frequency the problems are encountered. Regex, I think, is simple enough & frequent enough where that works very well.

> As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly.

It doesn't matter how fast I get at regex, I still won't be able to type any but the shortest (<5 characters) patterns out quicker than an LLM can. They are typing assistants that can make really good guesses about my vaguely worded intent.

As for learning deficit: I am learning so much more thanks to heavy use of LLMs!

Prior to LLMs the idea of using a 100 line PostgreSQL query with embedded regex to answer a mild curiosity about my use of alt text would have finished at the idea stage: that's not a high value enough problem for me to invest more than a couple of minutes, so I would not have done it at all.

Good points. Also looking at your original example I noticed that not only humans can explain regularities they expect in many different ways (also correcting along the way), they can basically ask LLM to base the result on a reference. In your example you provided a template with an img tag and brackets having different attributes patterns. But one can also just ask for a html-style tag. As I did with the "Please create a regex for extracting image file names when in a text a html-style tag img is met" (not posting it here, but "src" is clearly visible in the answer). So the "knowledge" from other domains is applied to the regex creation.
  • nevf1
  • ·
  • 7 hours ago
  • ·
  • [ - ]
I respectfully disagree. Thankfully, I don't need to write regex much, so when I do it's always like it's the first time. I don't find the syntax particularly intuitive and I always rely on web-based or third party tools to validate my regex.

Whenever I have worked on code smells (performance issues, fuzzy test fails etc), regex was 3rd only to poorly written SQL queries, and/or network latency.

All-in-all, not a good experience for me. Regex is the one task that I almost entirely rely on GitHub Copilot in the 3-4 times a year I have to.

I know regex. But I use it so sparingly that every time I need it I forgot again the character for word boundary, or the character for whitespace, or the exact incantation for negative lookahead. Is it >!? who knows.

A shortcut to type in natural language and get something I can validate in seconds is really useful.

How do you validate it if you don’t know the syntax? Or are you saying that looking up syntax –> semantics is significantly quicker than semantics –> syntax? Which I don’t find to be the case. What takes time is grokking the semantics in context, which you have to do in both cases.
In my case most of my regex is for utility scripts for text processing. That means that I just run the script, and if it does what I want it to do I know I'm done.

LLMs have been amazing in my experience putting together awk scripts or shell scripts in general. I've also discovered many more tools and features I wouldn't have otherwise.

  • tough
  • ·
  • 17 hours ago
  • ·
  • [ - ]
That doesn’t answer the question. By “validate”, I mean “prove to yourself that the regular expression is correct”. Much like with program code, you can’t do that by only testing it. You need to understand what the expression actually says.
Testing something is the best way to prove that it behaves correctly in all the cases you can think of. Relying on your own (fallible) understanding is dangerous.

Of course, there may be cases you didn't think of where it behaves incorrectly. But if that's true, you're just as likely to forget those cases when studying the expression to see "what it actually says". If you have tests, fixing a broken case (once you discover it) is easy to do without breaking the existing cases you care about.

So for me, getting an AI to write a regex, and writing some tests for it (possibly with AI help) is a reasonable way to work.

I don’t believe this is true. That’s why we do mathematical proofs, instead of only testing all the cases one can think of. It’s important to sanity-check one’s understanding with tests, but mere black-box testing is no substitute for the understanding.
  • tough
  • ·
  • 2 hours ago
  • ·
  • [ - ]
Code is not perfect like math imho

libraries some times make weird choices

in theory theory and practice are the same, in practice not really

in the context of regex, you have to know which dialect and programming language version of regex you’re targeting for example. its not really universal how all libs/languages works

thus the need to test

Notice that site has a very usable reference list you can consult for all those details the GP forgets.
I was using perl in the late 90s for sysadmin stuff, have written web scrapers in python and have a solid history with regex. That being said, AI can still write really complex lookback/lookahead/nested extraction code MUCH faster and with fewer bugs than me, because regex is easy to make small mistakes with even when proficient.
There's often a bunch of edge cases that people overlook. And you also get quadratic behaviour for some fairly 'simple' looking regexes that few people seem aware of.
  • insin
  • ·
  • 18 hours ago
  • ·
  • [ - ]
IME it's not just longer, but also more difficult to tell the LLM precisely what you want than to write it yourself if you need a somewhat novel RegExp, which won't be all over the training data.

I needed one to do something with Markdown which was a very internal BigCo thing to need to do, something I'd never have written without weird requirements in play. It wasn't that tricky, but going back trying to get LLMs to replicate it after the fact from the same description I was working from, they were hopeless. I need to dig that out again and try it on the latest models.

I personally didn’t really understand how to write regex until I understood “regular languages” properly, then it was obvious.

I’ve found that the vast majority of programmers today do not have any foundation in formal languages and/or the theory of computation (something that 10 years ago was pretty common to assume).

It used to be pretty safe to assume that everyone from perl hackers to computer science theorists understood regex pretty well, but I’ve found it’s increasingly a rare skill. While it used to be common for all programmers to understand these things, even people with a CS background view that as some annoying course they forgot as soon as the exam was over.

I use regex as an alternative to wildcards in various apps like notepad++ and vscode. The format is different in each app. And the syntax is somewhat different. I have to research it each time. And complex regex is a nightmare.

Which is why I would ask an AI to build it if it could.

The first languge I used to solve real problems was perl, where regex is a first class citizen. In python less so, most of my python scripts don't use it. I love regex but know several developers who avoid it like plague. You don't know what you don't know, and there's nothing wrong with that. LLM's are super helpful for getting up to speed on stuff.
Regex, especially non standard (and non regular) extensions can be pretty tricky to grok.

http://alf.nu/RegexGolf?world=regex&level=r00

/foo/

took me 25.75 seconds, including learning how the website worked. I actually solved it in ~15 seconds, but I hadn't realized I got the correct answer becuase it was far too simple.

This website is much better https://regexcrossword.com/challenges/experienced/puzzles/e9...

  • tough
  • ·
  • 18 hours ago
  • ·
  • [ - ]
Its something you use so sparingly far away usually that never sticks around
A cheat sheet is just a web search away.
since you know so much regex, why dont you write a regex html parser /s
Can't wait for another regex-induced massive outage [0].

[0]: https://blog.cloudflare.com/details-of-the-cloudflare-outage...

"Text to SQL", "text to regex", "text to shell", etc. will never fundamentally work because the reason we have computer languages is to express specific requirements with no ambiguity.

With an AI prompt you'll have to do the same thing, just more verbosely.

You will have to do what every programmer hates, write a full formal specification in English.

Can't believe I'm seeing something from Google involving shoes but it isn't named gShoe.
All this LLM written SQL stuff sounds great until you realize if you don’t really know SQL you won’t be able to debug or fix any broken SQL an LLM generates.

Thus, this is mainly just a tool for true experts to do less work and still get paid the same, not a tool for beginners to rise to the level of experts.

It depends, sometimes just feeding back broken SQL with "that didn't return any rows, can you fix it" and it comes up with something that works. Or "you're looking at the wrong entity, look at this table instead" or whatever, without knowing how to write competent SQL.

Obviously being able to at least read a bit of SQL and understanding the basic idea of relational databases helps loads.

> It depends, sometimes just feeding back broken SQL with "that didn't return any rows, can you fix it" and it comes up with something that works.

But how do you know if the SQL is correct, or just happened to return results that match for one particular case?

how do you know SQL is correct if you write it yourself or another teamate of yours?
Because I know the language...?
if you know the language verify the llm output :)
I'm not an expert but I've written SQL on and off for years. LLMs help me when I can describe my intent but can't think how to implement it. I don't expect a perfect solution just a starting point that I can refine.
Have you not actually used LLMs? Just copy in the errors and away it goes.
Error goes away but it gives the wrong result.
If LLMs are so wonderful we can just read from B+ Tree storage engines directly. SQL, ORMs, Query Planners... all bloat.
great point
This is pretty simple in any foundation model, provide a well commented schema and ask for the query
Step 1: Your schema has thousands of tables and there aren't many comments.

Step 2...

Use AI to generate the comments of course
Exactly, add any documentation you have about the app for more context too.
  • fsndz
  • ·
  • 19 hours ago
  • ·
  • [ - ]
the smolagents library is also pretty nice to do the scaffolding around the model. Text to sql seems simple in demos, but to make it work in real life complex cases is very hard: https://medium.com/thoughts-on-machine-learning/build-a-text...
there’s two kinds of people using AI to generate SQL…those who say it’s already solved and those who say it’ll be impossible to ever solve
Yep, probably why I got upvoted then downvoted. I've been using LLMs to write SQL queries for 2 years.
I agree. There's really no magic to it any more. The table create DDL commands are a very precise description of the tables, so almost nothing more is ever needed. You can just describe in detail what query you need, and any decent LLM can do it just fine.
From "Show HN: We open sourced our entire text-to-SQL product" (2024) https://news.ycombinator.com/item?id=40456236 :

> awesome-Text2SQL: https://github.com/eosphoros-ai/Awesome-Text2SQL

> Awesome-code-llm > Benchmarks > Text to SQL: https://github.com/codefuse-ai/Awesome-Code-LLM#text-to-sql

> Even with a high-quality model, there is still some level of non-determinism or unpredictability involved in LLM-driven SQL generation. To address this we have found that non-AI approaches like query parsing or doing a dry run of the generated SQL complements model-based workflows well. We can get a clear, deterministic signal if the LLM has missed something crucial, which we then pass back to the model for a second pass. When provided an example of a mistake and some guidance, models can typically address what they got wrong.

Sounds like a bunch of bespoke not-AI work is being done to make up for LLM limitations that point blank can’t be resolved.

[dead]
[dead]
[dead]
[dead]
[flagged]
If you know SQL, then yeah! But if you don't know SQL, using an AI to write a few queries & debug them is a great way to learn it.

I'm pretty comfortable with sql but still found it a fabulous tool recently. I have a sql database which describes a tree of some ~600k events. Each event is in a session (via session_id). Most events have a parent event - and trees of events can involve multiple sessions.

I wanted to add two derived columns to my events table. For each event, I wanted to name the root event for that event's tree and the root event within this session. I had code in typescript to do it - but unsurprisingly it was pretty slow. Well, it turns out you can write a recursive SQL query which can traverse the graph and populate those columns. I had no idea that was even possible.

ChatGPT managed it pretty well - though I ended up making a bunch of tweaks to the query it suggested to simplify it. I learned a bunch of SQL in the process - and that was cool! Obviously I could have read the SQL documentation and figured it out myself, but it was faster & easier using chatgpt. Writing SQL queries is a fantastic use case for LLMs.

  • ·
  • 16 hours ago
  • ·
  • [ - ]