Getting AI to write good SQL

501
358
richards
1 month ago
cloud.google.com

wewewedxfgdf
·
1 month ago
·
[ - ]

Can I just say that Google AI Studio with latest Gemini is stunningly, amazingly, game changingly impressive.

It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.

Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.

CuriouslyC
·
1 month ago
·
[ - ]

I'm really surprised more people haven't caught on. Claude can one shot small stuff of similar complexity, but as soon as you start to really push the model into longer, more involved use cases Gemini pulls way ahead. The context handling is so impressive, in addition to using it for coding agents, I use Gemini as a beta reader for a fairly long manuscript (~85k words) and it absolutely nails it, providing a high level report that's comparable to what a solid human beta reader would provide in seconds.

wewewedxfgdf
·
1 month ago
·
[ - ]

It is absolutely the greatest golden age in programming ever - all these infinitely wealthy companies spending bajillions competing on who can make the best programming companion.

Apart from the apologising. It's silly when the AI apologises with ever more sincere apologies. There should be no apologies from AIs.

yujzgzc
·
1 month ago
·
[ - ]

You're absolutely right! My mistake. I'll be careful about apologizing too much in the future.

DonHopkins
·
1 month ago
·
[ - ]

You sound like a Canadian LLM!

smoyer
·
1 month ago
·
[ - ]

Eh?

gregw2
·
1 month ago
·
[ - ]

It's only a Canadian LLM if you ask it what its favorite food is and it says "poutine".

smoyer
·
1 month ago
·
[ - ]

Or Tim Hortons?

drpbl
·
1 month ago
·
[ - ]

Does the barber shave himself?

paganel
·
1 month ago
·
[ - ]

> It is absolutely the greatest golden age in programming ever

It depends, because you now have to pay in order to be able to compete against other programmers who're also using AI tools, it wasn't like that in what I'd call the true "golden age", basically the '90s - early part of the 2000s, when the internet was already a thing and one could put together something very cool with just a "basic" text editor.

danieldk
·
1 month ago
·
[ - ]

One could put something cool together without internet using Delphi. The Borland IDEs were ahead of their time - built-in debugger, profiler, and pretty good documentation. My 'internet' was the SWAG Pascal snippet collection (which could be used fully offline). Someone converted it to HTML:

http://www.retroarchive.org/swag/index.html

conartist6
·
1 month ago
·
[ - ]

Wow yeah I'm old enough to remember when the focus wasn't on the programmers, but on the people the programs were written for.

We used to serve others, but now people are so excited about serving themselves first that there's almost no talk of service to others at all anymore

theropost
·
1 month ago
·
[ - ]

I wish my AI would tell me when I'm going in the wrong direction, instead of just placating my stupid request over and over until I realize.. even though it probably could have suggested a smarter direction, but instead just told me "Great idea! "

Workaccount2
·
1 month ago
·
[ - ]

I don't know if you have used 2.5, but it is the first model to disagree with directions I have provided...

"..the user suggests using XYZ to move forward, but that would be rather inefficient, perhaps the user is not totally aware of the characteristics of XYZ. We should suggest moving forward with ABC and explain why it is the better choice..."

gwern
·
1 month ago
·
[ - ]

Have you noticed the most recent one, gemini-2.5-pro-0506, suddenly being a lot more sycophantic than gemini-2.5-pro-0325? I was using it to beta-read and improve a story (https://news.ycombinator.com/item?id=43998269), and when Google flipped the switch, suddenly 2.5 was burbling to me about how wonderful and rich it was and a smashing literary success and I could've sworn I was suddenly reading 4o output. Disconcerting. And the AI Studio web interface doesn't seem to let you switch back to -0325, either... (Admittedly, it's free.)

redog
·
1 month ago
·
[ - ]

It really gave me a lot of push back once when I wanted to use a js library over a python one for a particular project. Like I gave it my demo code in js and it basically said, "meh, cute but use this python one because ...reasons..."

rad_gruchalski
·
1 month ago
·
[ - ]

Wow, you can now pay to have „engineers” being overruled by artificial „intelligence”? People who have no idea are now going to be corrected by an LLM which has no idea by design. Look, even if it gets a lot of things right it’s still trickery.

I get popcorn and wait for more work coming my way 5 years down the road. Someone will have tidy this mess up and gen-covid will have lost all ability to think on their own by then.

rad_gruchalski
·
1 month ago
·
[ - ]

You must be confusing „intelligence” with „statistically most probable next word”.

·
1 month ago
·
[ - ]

stirfish
·
1 month ago
·
[ - ]

One trick I found is to tell the llm that an llm wrote the code, whether it did or not. The machine doesn't want to hurt your feelings, but loves to tear apart code it thinks it might've wrote.

jghn
·
1 month ago
·
[ - ]

I like just responding with "are you sure?" continuously. at some point you'll find it gets stuck in a local minima/maxima, and start oscillating. Then I backtrack and look at where it wound up before that. Then I take that solution and go to a fresh session.

calmoo
·
1 month ago
·
[ - ]

Isn’t this sort of what the reasoning models are doing?

jghn
·
1 month ago
·
[ - ]

Except they have no concept of what "right" is, whereas I do. Once it seems to gotten itself stuck in left field I go back a few iterations and see where it was.

thingsilearned
·
1 month ago
·
[ - ]

companion or replacement?

surgical_fire
·
1 month ago
·
[ - ]

They are a replacement if your job is only to write code.

Especially if your code contains a few bugs, misconceptions, and is sometimes completely unable to fix mistakes, going back and forth into the same wrong solutions.

This is not to say that AI assistants are useless. They are a good productivity tool, and I can output code much faster, especially for domains I am very familiar with.

That said, these starry-eyed AI circlejerk threads are incredibly cringe.

tonyhart7
·
1 month ago
·
[ - ]

they would replace entire software department until AI make bug because endless changes into your javascript framework then they would hire human again to make fix

we literally creating solution for our own problem

roflyear
·
1 month ago
·
[ - ]

Or, just let their users deal with the bugs b/c churn will be less than the cost of developers.

scottmf
·
1 month ago
·
[ - ]

Right. Look at Electron apps. They're ubiquitous despite the poorer performance and user experience because the benefits outweigh the negatives.

Maintaining a codebase isn't going to be a thing in the future, at least not in the traditional/current sense.

Terr_
·
1 month ago
·
[ - ]

... Or saboteur. :p

snthpy
·
1 month ago
·
[ - ]

I also used it to "vibe write" a short story. I use it similarly to vibe coding, I give the theme and structure of the story along with the major sections and tensions and conflicts I want to express and then it filled in the words in my chosen style. I also created an editor persona and then we went back and forth between the editor and writer personas to refine the story.

The Omega Directive: https://snth.prose.sh/the_omega_directive

CuriouslyC
·
1 month ago
·
[ - ]

My writing process is a bit different from my coding process with AI, it's more of an iterative refinement process.

I tend to form the story arc in my head, and outline the major events in a timeline, and create very short summaries of important scenes, then use AI to turn those summaries into rough narrative outlines by asking me questions and then using my answers to fill in the details.

Next I'll feed that abbreviated manuscript into AI and brainstorm as to what's missing/where the flow could use improvement/etc with no consideration for prose quality, and start filling in gaps with new scenes until I feel like I have a compelling rough outline.

Then I just plow from beginning to end rewriting each chapter, first with AI to do a "beta" draft, then I rewrite significant chunks by hand to make things really sharp.

After this is done I'll feed the manuscript back into AI and get it to beta read given my target audience profile and ambitions for the book, and ask it to provide me feedback on how I can improve the book. Then I start editing based on this, occasionally adding/deleting scenes or overhauling ones that don't quite work based on a combination of my and AI's estimation. When Gemini starts telling me it can't think of much to improve the manuscript that's when it's time for human beta readers.

snthpy
·
1 month ago
·
[ - ]

Thank you for sharing that. I'm going to try that up to "then I rewrite significant chunks by hand to make things really sharp". I'm not a writer a would have never dreamed of writing anything until I gave this a try. I've often had ideas for stories though and using Gemini to bring these to "paper" has felt like a superpower similar how it must feel for people who can't code but now can able to create apps thanks to AI. I think it's a really exciting time!

I've been wondering about what the legalities of the generated content are though since we know that a lot of the artistic source content was used without consent?C an I put the stories on my blog? Or, not that I wanted to, publish them? I guess people use AI generated code everywhere so I guess for practical purposes the cat is out the bag and won't be put back in again.

CuriouslyC
·
1 month ago
·
[ - ]

If you've put manual work into curating and assembling AI output, you have copyright. It's only not copyrightable if you had the AI one shot something.

priceofmemory
·
1 month ago
·
[ - ]

That sounds very similar to my AI vibe writing process. Start with chapter outlines, then ask AI to fill in the details for each scene. Then ask AI to point out any plot holes or areas for improvement in the chapter (with relation to other chapters). Then go through chapter by chapter for a second rewrite doing the same thing. At ~100k words for a fan-fiction novel but expect to be at about 120k words after this latest rewrite.

https://frypatch.github.io/The-Price-of-Remembering/

koakuma-chan
·
1 month ago
·
[ - ]

And Gemini is free.

harvey9
·
1 month ago
·
[ - ]

The first hit is always free

conartist6
·
1 month ago
·
[ - ]

The investors know it. They're not competing to own this shit like it's gonna stay free.

hfgjbcgjbvg
·
1 month ago
·
[ - ]

Real.

nativeit
·
1 month ago
·
[ - ]

We’re all paying for this. In this case, the costs are only abstract, rather than the competing subscription options that are indeed quite tangible _and_ abstract.

scuol
·
1 month ago
·
[ - ]

Well, as with many of Google's services, you pay with your data.

Pay-as-you-go with Gemini does not snort your data for their own purposes (allegedly...).

Workaccount2
·
1 month ago
·
[ - ]

The cost to google lying about data privacy far exceeds the profit gained from using it. Alienate your most valuable customers (enterprise) so you can get 10% more training data? And almost certainly end up in a sea of lawsuits from them?

Not happening. Investors would riot.

ayrtondesozzla
·
1 month ago
·
[ - ]

Indeed, the first stage of the enshittification process requires mollycoddling the customer in a convincing manner.

Looking forward to stage 2 - start serving the advertisers while placating the users, and finally stage 3 - offering it all up to the investors while playing the advertisers off each other and continuing to placate the users.

maksimur
·
1 month ago
·
[ - ]

Undoubtedly, but a significant positive aspect is the democratization of this technology that enables access for people who could not afford it, not productively, that is.

oplorpe
·
1 month ago
·
[ - ]

I’ve yet to see any llm proselytizers acknowledge this glaring fact:

Each new release is “game changing”.

The implication being the last release y’all said was “game changing” is now “from a different century”.

Do you see it?

For this to be an accurate and true assessment means you were wrong both before and wrong now.

ayrtondesozzla
·
1 month ago
·
[ - ]

I'm not an LLM proselytiser but this makes no sense? It would almost make sense if someone were claiming there are only two possible games, the old one and the new one, and never any more. Who claims that?

oplorpe
·
1 month ago
·
[ - ]

I suppose my point is along these lines.

When gpt3 was trained, its parent company refused to release the weights claiming it was a “clear and present danger to civilization”. Now GPT3 is considered a discardable toy.

So either these things are going toward an inflection point of usefulness or this release too will be, in time, mocked as a discardable toy too.

So why every 3 days do we get massive threads with people fawning over the new fashion like this singular technology that is developing is ackshually finally become the fire stolen from the gods.

ayrtondesozzla
·
1 month ago
·
[ - ]

Well essentially then, I agree, I find it perplexing too.

I got particularly burned by that press release a little before Christmas, where it was claimed that 4o was doing difficult maths and programming stuff. A friend told me about it very excitedly, I imagined they were talking about something that had really happened.

A few days later when I'd time to look into it, it turned out that essentially we'd internal testing and press releases to go off, I couldn't believe it. I said - so, marketing. A few months later it was revealed that a lot of the claimed results in those world-changing benchmarks were due to answers that had been leaked, etc etc. The usual hype theatre, in the end

jstummbillig
·
1 month ago
·
[ - ]

That is fairly easily explained: Imagine there existed a trick that (you think) ups your productivity by 10% every other week without you having to do, well, anything. It does not matter where the floor was, and there seems to be no ceiling. The trick was really good the first time and it continues to be good every time.

BeFlatXIII
·
1 month ago
·
[ - ]

> When gpt3 was trained, its parent company refused to release the weights claiming it was a “clear and present danger to civilization”. Now GPT3 is considered a discardable toy.

That was just the cover story for being greedy with the data.

nitwit005
·
1 month ago
·
[ - ]

People are definitely being hyperbolic.

If we were actually getting revolutionary, game changing, improvements to productivity every couple of months. The whole industry would be in such chaos that people would be pausing investments until the chaos stopped.

trympet
·
1 month ago
·
[ - ]

Parent is implying that we're still playing the same game.

ludwik
·
1 month ago
·
[ - ]

Is "game-changing" supposed to imply changing the game to a completely different one? Like, is the metaphor that we were playing soccer, and then we switched to paintball or basketball or something? I always understood it to mean a big change within the same game - like we’re still playing soccer, but because of a goal or a shift, the team that was on defense now has to go on offense...

squidbeak
·
1 month ago
·
[ - ]

I'm unsure I fully understand your contention.

Are you suggesting that a rush to hyperbole which you don't like means advances in a technology aren't groundbreaking?

Or is it that if there is more than one impressive advance in a technology, any advance before the latest wasn't worthy of admiration at the time?

oplorpe
·
1 month ago
·
[ - ]

Yes, it’s the hyperbole.

This is, at best, an incremental development of an existing technology.

Though even that is debatable considering the wildly differing opinions in this thread in regards to this vs other models.

59nadir
·
1 month ago
·
[ - ]

About 1.5-2 years ago I was using GitHub Copilot to write code, mostly as a boilerplate completer, really, because eventually I realized I spent too much time reading the suggestions and/or fixing the end result when I should've just written it completely myself. I did try it out with a pretty wide scope, i.e. letting it do more or less and seeing what happened. All in all it was pretty cool, I definitely felt like there were some magic moments where it seems to put everything together and sort of read my mind.

Anyway, that period ended and I went until a few months ago without touching anything like this and I was hearing all these amazing things about using Cursor with Claude Sonnet 3.5, so I decided to try it out with a few use cases:

1. Have it write a tokenizer and parser from scratch for a made up Clojure-like language

2. Have it write the parser for the language given the tokenizer I had already written previously

3. Have it write only single parsing functions for very specific things with both the tokenizer and parsing code to look at to see how it works

#1 was a complete and utter failure, it couldn't even put together a simple tokenizer even if shown all of the relevant parts of the host language that would enable a reasonable tokenizer end result.

#2 was only slightly better, but the end results were nowhere near usable, and even after iteration it couldn't produce a runnable result.

#3 is the first one my previous experience with Copilot suggested to me should be doable. We started out pretty badly, it misunderstood one of the tokenizer functions it had examples for and used it in a way that doesn't really make sense given the example. After that it also wanted to add functions it had already added for some reason. I ran into myriad issues with just getting it to either correct, move on or do something productive until I just called it quits.

My personal conclusion from all of this is that yes, it's all incredibly incremental, any kind of "coding companion" or agent has basically the same failure modes/vectors they had years ago and much of that hasn't improved all that much.

The odds that I could do my regular work on 3D engines with the coding companions out there are slim to none when it can't even put together something as simple as a tokenizer together, or use an already existing one to write some simple tokenizer functions. For reference I know that it took my colleague who has never written either of those things 30 minutes until he productively and correctly used exactly the same libraries the LLM was given.

raincole
·
1 month ago
·
[ - ]

It is just how fast this field advances compared to all the other things we've seen before. Human language doesn't have better words to describe this unusual phenomenon, so we resort to "game-changing".

in_ab
·
1 month ago
·
[ - ]

I asked it to make some changes to the code it wrote. But it kept pumping out the same code with more and more comments to justify itself. After the third attempt I realized I could have done it myself in less time.

Eezee
·
1 month ago
·
[ - ]

I tried it out because of your comment and the very first prompt Gemini 2.5 Pro hallucinated a non-existant plugin including detailed usage instructions.

Not really my idea of good.

patrick451
·
1 month ago
·
[ - ]

This has consistently been my experience with every LLM I have tried. Everybody says "Oh, you tried it one the model from two months ago? Doesn't count, the new ones are sooo much better". So I try the new one and it still hallucinates.

Jnr
·
1 month ago
·
[ - ]

Same thing, I thought I would give it a shot and it got the first solution so wrong in a simple nextjs typescript project I laughed out. It was fast but incorrect.

gundmc
·
1 month ago
·
[ - ]

Can you provide your prompt? This hasn't matched my experience. You can also try enabling search grounding in the right hand bar. You have to also explicitly tell it in your prompt to use grounding with Google Search, but I've had very good success with that even for recent or niche plugins/libraries.

th0ma5
·
1 month ago
·
[ - ]

So glad we're pinning the success and learning of new technology on random anecdotes. Do pro AI people not see how untenable it is where everything is a rumor?

cooperaustinj
·
1 month ago
·
[ - ]

It is only a rumor to people who refuse to put in effort.

mostlysimilar
·
1 month ago
·
[ - ]

I'd rather put my effort into developing my own skills, not hand-holding a hallucinating robot.

-__---____-ZXyw
·
1 month ago
·
[ - ]

I enjoyed the clarity of that sentence. It's wild to read. Some people are choosing the hand-holding of the hallucinating robot instead of developing their skills, and simultaneously training their replacement (or so the bosses hope, anyway).

I wonder if "robot" was being used here in its original sense too of a "forced worker" rather than the more modern sense of "mechanical person". If not, I propose it.

th0ma5
·
1 month ago
·
[ - ]

What does this even mean? Lol.

belter
·
1 month ago
·
[ - ]

> Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it

Every time in the last three or four weeks, there is a post here about Gemini, the top comment, or one of the top comments is something along these lines. And every time I spend a few minutes making empirical tests to check if I made a mistake in cancelling my paid Gemini account after giving up on it...

So I just did a couple of tests sending the same prompt on some AWS related questions to Gemini Pro 2.5 (free) and Claude paid, and no, Claude still better.

Workaccount2
·
1 month ago
·
[ - ]

Can you share the prompts?

bossyTeacher
·
1 month ago
·
[ - ]

> Today I'm not sure how I could continue programming without Google Gemini in my toolkit

Anyone else concerned about this kind of statements? Make no mistake, everyone. We are living in a LLM bubble (not an AI bubble as none of these companies are actually interested in AI as such as moving towards AGI). They are all trying to commercialise LLMs with some minor tweaks. I don't expect LLMs to make the kind of progress made by the first 3 iterations of GPT. And when the insanely hyped overvaluations crashed, the bubble WILL crash. You BETTER hope there is any money left to run this kind of tools at a profit or you will be back at Stackoverflow trying to relearn all the skills you lost using generative coding tools.

reacharavindh
·
1 month ago
·
[ - ]

I use Gemini2.5 Pro through work and it is excellent. However, I use Claude 3.7 Sonnet via API for personal use using money added to their account.

I couldn’t find a way to use Gemini like a prepaid plan. I ain’t giving my credit card to Google for an LLM that can easily charge me hundreds or thousands of EUR.

b0ringdeveloper
·
1 month ago
·
[ - ]

Try OpenRouter. Load up with $20 of credits and use their API for a variety of models across providers, including Gemini. I think you pay ~5% extra for the OpenRouter service.

belter
·
1 month ago
·
[ - ]

Do you work for OpenRouter?

calmoo
·
1 month ago
·
[ - ]

Why ask that? OpenRouter is very popular.

belter
·
1 month ago
·
[ - ]

So popular it was worth of finally create a HN account for the post?

·
1 month ago
·
[ - ]

petesergeant
·
1 month ago
·
[ - ]

Is this distinct from using Gemini 2.5 Pro? If not, this doesn’t match my experience — I’ve been getting a lot of poorly designed TypeScript with an excess of very low quality comments.

christophilus
·
1 month ago
·
[ - ]

The comments drive me nuts.

// Moved to foo.ts

Ok, great. That’s what git is for.

// Loop over the users array

Ya. I can read code at a CS101 level, thanks.

CommenterPerson
·
1 month ago
·
[ - ]

Sorry sounds like a marketing plug.

alostpuppy
·
1 month ago
·
[ - ]

How do you use it exactly? Does it integrate with any IDEs?

Mossy9
·
1 month ago
·
[ - ]

Jetbrains AI recently added (beta) access to Gemini Pro 2.5 and there's of course plugins like Continue.dev that provide access to pretty much anything with an API

pimeys
·
1 month ago
·
[ - ]

Zed supports it out of the box.

DonHopkins
·
1 month ago
·
[ - ]

Just install Cursor, it supports Gemini and many other LLMs right out of the box.

johnisgood
·
1 month ago
·
[ - ]

Unfortunately I cannot use Cursor, not until they fix https://github.com/getcursor/cursor/issues/598.

What about Zed or something else?

I have not used any IDEs like Cursor or Zed, so I am not sure what I should be using (on Linux). I typically just get on Claude (claude.ai) or ChatGPT and do everything manually. It has worked fine for me so far, but if there is a way to reduce friction, I am willing to give it a try. I do not really need anything advanced, however. I just want to feed it the whole codebase (at times), some documentation, and then provide prompts. I mostly care about support for Claude and perhaps Gemini (would like to try it out).

miyuru
·
1 month ago
·
[ - ]

There is Gemini Code Assist.

https://developers.google.com/gemini-code-assist/docs/overvi...

kdmtctl
·
1 month ago
·
[ - ]

Copilot has it in preview. I found it looks deeper on devops tasks in the Agent mode. But context matters, you should include everything and it will push. Now I switch between Cloude and Gemini when one of them starts going circles. Gemini certainly could have more context but Copilot clearly limits it. Didn't try with Studio key though, only default settings.

beauzero
·
1 month ago
·
[ - ]

Give Cline + vscode a try. Make sure to implement the "memory bank"...see Cline docs at cline.bot

sexy_seedbox
·
1 month ago
·
[ - ]

Roo Code + Roo Commander + Openrouter (connecting Gemini with Vertex AI) + Context7

mayas_
·
1 month ago
·
[ - ]

I guess it depends on the type of tasks you give it.

They all seem to work remarkably well writing typescript or python but in my experience, they fall short when it comes to shell and more broadly dev ops

·
1 month ago
·
[ - ]

MrDarcy
·
1 month ago
·
[ - ]

I’ve felt the same, but what is the equivalent of Claude code in Google’s ecosystem?

I want something running in a VM I can safely let all tools execute without human confirmation and I want to write my own tools and plug them in.

Right now a pro max subscription with Claude code plus my MCP servers seems to be the sweet spot, and a cursory look at the Google ecosystem didn’t identify anything like it. Am I overlooking something?

thinkxl
·
1 month ago
·
[ - ]

I think using Aider[1] with Google's models is the closest.

It's my daily driver so far. I switch between the Claude and Gemini models depending on the type of work I'm doing. When I know exactly what I want, I use Claude. When I'm experimenting and discovering, I use Gemini.

[1]: https://aider.chat/docs/llms/gemini.html

noosphr
·
1 month ago
·
[ - ]

It always is for the first week. Then you find out that the last 10% matter a lot more than than the other 90%. And finally they turn off the high compute version and you're left with a brain dead model that loses to a 32b local model half the time.

Barbing
·
1 month ago
·
[ - ]

If a user eventually creates half a dozen projects with an API key for each, and prompts Gemini side-by-side under each key, and only some of the responses are consistently terrible…

Would you expect that to be Google employing cost-saving measures?

landl0rd
·
1 month ago
·
[ - ]

Really? I get goofy random substitutions like sometimes from foreign languages. It also doesn't do good with my mini-tests of "can you write modern Svelte without inserting React" and "can you fix a borrow-checking issue in Rust with lifetimes, not Arc/Cell slop"

That doesn't mean it's worse than the others just not much better. I haven't found anything that worked better than o1-preview so far. How are you using it?

insin
·
1 month ago
·
[ - ]

Is it just me or did they turn off reasoning mode in free Gemini Pro this week?

It's pretty useful as long as you hold it back from writing code too early, or too generally, or sometimes at all. It's a chronic over-writer of code, too. Ignoring most of what it attempts to write and using it to explore the design space without ever getting bogged down in code and other implementation details is great though.

I've been doing something that's new to me but is going to be all over the training data (subscription service using stripe) and have often been able to pivot the planned design of different aspects before writing a single line of code because I can get all the data it already has regurgitated in the context of my particular tech stack and use case.

energy123
·
1 month ago
·
[ - ]

They rolled out a new model a week ago which has a "bug" where in long chats it forgets to emit the tokens required for the UI to detect that it's reasoning. You can remind it that it needs to emit these tokens, which helps, or accept that it will sometimes fail to do it. I don't notice a deterioration in performance because it is still reasoning (you can tell by the nature of the output), it's just that those tokens aren't in <think> tags or whatever's required by the UI to display it as such.

CuriouslyC
·
1 month ago
·
[ - ]

I think reasoning in the studio is gated by load, and at the same time I wasn't seeing so much reasoning in AIstudio, I was getting vertex service overloaded calls pretty frequently on my agents.

lifty
·
1 month ago
·
[ - ]

Excuse my ignorance, but is the good experience somehow influenced by Google AI Studio as well or only by the capability of the model itself? I know Gemini 2.5 is good, have been using it myself for a while. I still switch between Sonnet and Gemini, because I feel Claude code does some things better.

teleforce
·
1 month ago
·
[ - ]

There's a complex Numpy indexing codes riddle in the section of "I don’t like NumPy indexing" and Gemini Pro 2.5 came on top (DeepSeek R1, only get the first time right but not later) [1],[2].

> For fun, I tried asking a bunch of AI models to figure out what shapes those arrays have. Here were the results:

Based on the results from the top 8 state-of-the- art AI models, Gemini is the best and consistently got the right results:

[1] I don't like NumPy (204 comments):

https://news.ycombinator.com/item?id=43996431

[2] I don't like NumPy: I don’t like NumPy indexing:

https://dynomight.net/numpy/

DHolzer
·
1 month ago
·
[ - ]

without wanting to sound overly sceptical, what exactly makes you think it performs so much better compared to claude and chatgpt?

Is there any concrete example that makes it really obvious? I had no such success with it so far and i would really like to see the clear cut between the gemini and the others.

conartist6
·
1 month ago
·
[ - ]

You don't worry that you can't think anymore without paying google to think for you?

conartist6
·
1 month ago
·
[ - ]

OK, a better scenario than that: for some reason they cut you off. They're a huge company, they don't really care, and you would have no recourse. Many people live this story. Where once you were a programmer, if Google convinces you to eliminate your self-reliance they can then remotely turn off you being a programmer. There are other people who will use those GPU cycles to be programmers! Google will still make money.

ifellover
·
1 month ago
·
[ - ]

Absolutely agree. I really pushed it last week with a screenshot of a very abstract visualisation that we’d done in a Miro board of which we couldn’t find a library that did exactly what we wanted, so we turned to Gemini.

Essentially we were hoping to tie that to data inputs and have a system to regularly output the visualisation but with dynamic values. I bet my colleague it would one shot it: it did.

What I’ve also found is that even a sloppy prompt still somehow is reading my mind on what to do, even though I’ve expressed myself poorly.

Inversely, I’ve really found myself rejecting suggestions from ChatGPT, even o4-mini-high. It’s just doing so much random crap I didn’t ask and the code is… let’s say not as “Gemini” as I’d prefer.

alecco
·
1 month ago
·
[ - ]

Remember when Microsoft started to do good things? Big corps suck when they are on top and unchallenged. It's imperative to reduce their monopolies.

Gud
·
1 month ago
·
[ - ]

No, I don’t.

ionwake
·
1 month ago
·
[ - ]

lmao

·
1 month ago
·
[ - ]

Der_Einzige
·
1 month ago
·
[ - ]

Shhh!!! Normies will catch on and google will stop making it free.

But more seriously, they need to uncap temperature and allow more samplers if they want to really flex on their competition.

energy123
·
1 month ago
·
[ - ]

Can you explain what you mean by "uncapping" temperature and "samplers"? You can currently set temperature to whatever you want. Or do you want > 2 temp.

the_arun
·
1 month ago
·
[ - ]

Are you talking about Firebase Studio?

Saul1998zx
·
1 month ago
·
[ - ]

6451937099

yahoozoo
·
1 month ago
·
[ - ]

Nice try, Mr. Google.

But seriously, yeah, Gemini is pretty great.

wetpaws
·
1 month ago
·
[ - ]

[dead]

CommenterPerson
·
1 month ago
·
[ - ]

SQL Data analyst with years and years of experience here.

Most of my roles were in small teams building quick ad-hoc analyses for business leaders in large multi billion dollar businesses. Example, one db was Oracle e-business suite. It had been set up ~20 years prior with enhancements along the way. There were only a handful of people in the company who knew what the fields helpfully named like ATTR_000349857. Everyone was overworked with urgent requests (and occasional layoffs) and no one bothered to spend time on documenting the database. I suppose this topic fits under "Provide business-specific context". Great, hire someone to understand and document all that crap. Ain't going to happen.

The other roles also fit this pattern, different systems but urgent needs, always a drought of people who understood both the business and the database.

Occasionally we'd get some "AI team" come looking for "data". After many mindless meetings with no clear objective except "increase profits", they'd quietly disappear.

I use stack overflow a lot to pull SQL examples and tweak. A lot of AI talk feels like hype -- to start with, i'd suggest not redefining words like "hallucination", "intelligence" etc which mean something else in English. Maybe call it "advanced algorithms" and stop with the hype. Also the surveillance and extraction of user data for advertising. Thank you.

senko
·
1 month ago
·
[ - ]

> Oracle e-business suite [...] set up ~20 years prior

> fields helpfully named like ATTR_000349857

> Everyone was overworked

> no one bothered to spend time on documenting the database.

I can't blame AI for not being helpful here, nothing short of divine intervention can fix that.

levocardia
·
1 month ago
·
[ - ]

In one of Stephen Boyd's lectures on convex optimization, he has some quip like "if your optimization problem is computationally intractable, you could try really hard to improve the algorithm, or you could just go on vacation for a few weeks and by the time you get back, computers will be fast enough to solve it."

I feel like that's actually true now with LLMs -- if some query I write doesn't get one-shotted, I don't bother with a galaxy-brain prompt; I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.

owebmaster
·
1 month ago
·
[ - ]

> I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.

1 month to write some code with LLM, that's quite the opposite of the promised productivity gain

AbstractH24
·
1 month ago
·
[ - ]

Has the pace of this slown down or I have just lost track of the narrative?

Feels like innovation in AI is rapidly changing from paradigm-shifting to incremental.

th0ma5
·
1 month ago
·
[ - ]

Except here the core functionality changes day to day and hinges on specific word usage.

user3939382
·
1 month ago
·
[ - ]

Try getting it to write a codepen sim of 3 rectangles parallel parking.

mykowebhn
·
1 month ago
·
[ - ]

I understand from a technical POV how this could be considered great news.

But I don't see how this is good news at all from a societal POV.

The last 15 or so years has seen an unprecedented rise in salaries for engineers, especially software engineers. This has brought an interest in the profession from people who would normally not have considered SW as a profession. I think this is both good and bad. It has brought new found wealth to more people, but it may have also diluted the quality of the talent pool. That said, I think it was mostly good.

Now with this game-changing efficiency from these AI tools, I'm sure we've seen an end to the glory days in terms of salaries for the SW profession.

With this gone, where else could relatively normal people achieve financial independence? Definitely not in the service industry.

Very sad.

thicTurtlLverXX
·
1 month ago
·
[ - ]

I understand how, from a technical POV, electricity and electrification could be considered great news.

But I don't see how this is good news at all from a societal POV.

Think about all the lamplighters who lost their jobs. Streetlights just turn on now? Lamplighting used to be considered a stable job! And what about the ice cutters…

For real tho, it's not like there's nothing left to do — we still have potholes to fix, t-shirts to fold and illnesses to cure. Just the fact that many people continue to believe that wars are justified by resource scarcity shows we need technological progress.

Winsaucerer
·
1 month ago
·
[ - ]

> we still have potholes to fix, t-shirts to fold and illnesses to cure

Only one of these things interests me. The hype of AI is threatening to kill something I actually enjoy doing. If the hype actualises, I'll likely find myself having to do something I don't enjoy. That being said, if programming can be automated, then probably every white collar job is under serious threat.

mykowebhn
·
1 month ago
·
[ - ]

From what I understand, prior to the 1980s/90s lamplighters, waiters, factory workers, etc. could live comfortable lives on decent wages.

These days not so much.

christophilus
·
1 month ago
·
[ - ]

From what I understand, life was Dickensian hell for many people. Communism wouldn’t have had much of a chance if everyone was pretty much able to live a decent life as a lamp lighter.

DHolzer
·
1 month ago
·
[ - ]

how is that technological progress not fueling resource scarcity?

user3939382
·
1 month ago
·
[ - ]

I can’t reconcile statements like this with my experience trying to code with LLMs. As soon as there’s any real complexity they spit out nonsense broken code that in some cases could take a long time to debug. Then when you correct it “You’re totally right, I’ll change it so that x y z”. If you weren’t a senior dev with loads of experience you wouldn’t be able to debug or correct the code these tools produce.

mykowebhn
·
1 month ago
·
[ - ]

If you were a new dev now learning the ropes, with these AI coding tools available, I highly doubt you would gain the same "loads of experience".

Learning comes through struggle and it's too easy to bypass that struggle now. It's so much easier to get the answers from AI.

MonkeyClub
·
1 month ago
·
[ - ]

> Learning comes through struggle

I often find myself repeating this, although one would think it's well-known or even self-evident.

If there's no active struggle, there's no remaining knowledge, it's just fleeting information.

zkry
·
1 month ago
·
[ - ]

Im curious why there's this sentiment in regarding advances in AI. High level programming languages didnt in the least bit take away the value of the SW profession, despite allowing a vast number more people to write software.

The amount and complexity of software will expand to its very outer bounds for which specialists will be required.

AbstractH24
·
1 month ago
·
[ - ]

A better comparison I think is low-code platforms.

There are plenty of folks making a living using platforms like Salesforce and “clicks not code,” but it never led to an implosion of the SE job market. Just expanded the tech job pool. And it’s hard to imagine how that would have happened if everything needed to be coded.

Like how a growth in medical-paraprofessionals didn’t negate the need for doctors and nurses.

a_imho
·
1 month ago
·
[ - ]

I've not fully bought the hype yet but actually think LLMs democratizing technical solutions would be a fantastic opportunity for both established players and newcomers. The more LLMs improve, the less of a moat technology is in itself.

lodovic
·
1 month ago
·
[ - ]

While AI can empower experts with strong prompt engineering skills, I don’t believe it enables the creation of truly complex solutions unless the user already possesses the necessary expertise. For seasoned developers, these tools are fantastic. For those without a software background, they appear almost magical. But people shouldn’t build solutions they don’t fully understand, it leads to a maintainability nightmare.

rocqua
·
1 month ago
·
[ - ]

Sounds like there need to be measures to fix income inequality.

lerp-io
·
1 month ago
·
[ - ]

Aren't programmers supposed to build digital products for end users and this just makes it faster? more like POV from a person who got hired...EOD you need to think what you are doing for the bigger world and what the world can do for you because that is what your end user - your boss - is thinking. people just like to swim in their lane in their own little B2B world (i do X = i get Y) without ever stopping to think about anything except what is in front of them

hiAndrewQuinn
·
1 month ago
·
[ - ]

It's better for society to get much wealthier, much faster, by opening up the possibility for anyone to do advanced programming, than for a small class of anointed and studied elites to get rich via this exclusion. It's the opposite of sad. It's the best thing that ever happened for the productive use of a computer by a layperson since the invention of the search engine.

prmph
·
1 month ago
·
[ - ]

LLMs are not going to allow you to do advanced programming if you couldn't already do it by hand. The thing about LLMs is, they are a force multiplier, imperfectly, but I guess they are getting there. The overall vision (unless trivial), architecture, functionality-gaps-filling, revisions, etc. of an advanced project is not going to come from an LLM.

I personally don't think we are ever going to get to that point where I can give a simple propnmt and have an LLM generate a complex app ready to run. Think about what that would require:

1. The LLM would have to read my mind and extrapolate all the minute decisions I would make to implement the app based on the prompt.

2. Assuming the LLM can get past (1), it would have to basically be AGI to be able to implement pretty much whatever I can dream up.

3. If 2 & 3 above is somehow achieved, it would be economically very valuable, and you can bet that functionality is not going to be casually enabled in LLMs, for just anyone to use.

hiAndrewQuinn
·
1 month ago
·
[ - ]

>LLMs are not going to allow you to do advanced programming if you couldn't already do it by hand.

This is simply false. I have empirically already seen many people get LLMs to write code they couldn't do by hand in any reasonable time frame. ("Let me take a 3 month course on Python and then get back to writing this pandas script by hand" is not a reasonable time frame when your competition can pump it out for you in 30 seconds.)

Some of these people go on to learn how to code themselves, but others are perfectly happy with treating the results they're currently getting as a black box. LLMs are a force multiplier on general productivity moreso than they are on programming ability specifically.

>[I]t would be economically very valuable, and you can bet that functionality is not going to be casually enabled in LLMs

But nothing else in the world works like this. An economically very valuable thing is, almost by definition, something with a lot of demand for it. That's like saying Excel would be so valuable if only it could do math, so of course Microsoft isn't going to put a calculator into its spreadsheet program.

eqvinox
·
1 month ago
·
[ - ]

Sure, but this isn't "anyone" doing advanced programming, it's the LLM doing it. The humans get skill in using LLM, not programming, and whether this new skill will make anyone wealthy is an open question.

(Also, just by market logic, rare skills in demand are always paid more; I'm not sure why you're calling it an "exclusion". The education system in a lot of places might have that function, but that's a separate issue not helped by LLMs writing SQL?)

hiAndrewQuinn
·
1 month ago
·
[ - ]

I contest that. A human using an LLM to program, is a human programming. Gaining skill with the LLM is gaining skill in programming. And the things most people are both able and willing to now create with LLMs are of vastly greater complexity than whatever they were doing before - so yes, it's advanced programming.

I also contest your definition of wealth. Society absolutely and obviously becomes wealthier when many more people are able to use computers for more advanced things. Just because that wealth doesn't directly appear as zeros in your financial statements doesn't mean the wealth hasn't been created.

eqvinox
·
1 month ago
·
[ - ]

I'm happy to accept your contest, but you should be aware that both of our opinions are only beliefs at this point and science will have an answer at some point in the future when the effects of LLMs on humans are understood better.

I do have to ask though, who do you think will pay the electricity bill for disenfranchised groups lacking wealth the most strongly to use LLMs? Some things might be free right now, but what do you think will happen when some of e.g. OpenAI's $300bn valuation is being collected?

hiAndrewQuinn
·
1 month ago
·
[ - ]

Electricity in the US currently hovers around 20 cents per kWh; 1 kWh gets you somewhere on the order of 1000 ChatGPT queries. So that's about 0.02 cents per query.

At the current US federal minimum wage of $7.25, that means you would need to work for about 0.1 seconds to afford the electricity bill for one query. Maybe 1 full second for one of the really hard ones.

So my serious answer to your question is, I'm pretty sure these disenfranchised groups can find room in their budget for a query or two if they think it's worth doing. In fact, they might even find ways to save more than 1 second by using that query, in which case they can produce more economic value... And so society gets richer.

eqvinox
·
1 month ago
·
[ - ]

I apologize for being excessively metaphorical and using "the electricity bill" to refer to the overall cost of using AI.

That said, it should be clear to you that any future pricing of AI services after the currently ongoing honeymoon period will need to recover the initial investments that have happened in the past years, pay for silicon, and make some profit too.

hiAndrewQuinn
·
1 month ago
·
[ - ]

Actually, no, that's not clear to me either. While profit is certainly always what private enterprises aim for, it's also possible they will simply fail at it. Companies go bankrupt all the time. Those investments might simply be lost as other, better, cheaper options become available.

Nowhere should give us more confidence in this outcome being not just possible but likely, over the next 5-10 years, than the rapid growth of locally runnable models. There the fixed cost really is market rate silicon and the ongoing variable costs simply market rate electricity. So I don't even think you were being excessively metaphorical with that. That is actually where the price will head.

foldr
·
1 month ago
·
[ - ]

Software engineers earning enough to achieve financial independence are generally employed by FAANG or (indirectly) by venture capitalists who have more money than they know what to do with.

With all this money sloshing around, it takes only a little imagination to think of ways of channeling some of it to working people without employing them to write pointless (or in some cases actively harmful) software.

mritchie712
·
1 month ago
·
[ - ]

the short answer: use a semantic layer.

It's the cleanest way to give the right context and the best place to pull a human in the loop.

A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.

With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.

We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.

My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.

0 - https://www.definite.app/

1 - https://cube.dev/

2 - https://news.ycombinator.com/item?id=25930190

ljm
·
1 month ago
·
[ - ]

> you get the added benefit of writing queries in JSON instead of raw SQL.

I’m sorry, I can’t. The tail is wagging the dog.

dang, can you delete my account and scrub my history? I’m serious.

fhkatari
·
1 month ago
·
[ - ]

You move all the tools to debug and inspect slow queries, in a completely unsupported JSON environment, with prompts not to make up column names. And this is progress?

·
1 month ago
·
[ - ]

mritchie712
·
1 month ago
·
[ - ]

The JSON compiles to SQL. Have you used a semantic layer? You might have a different opinion if you tried one.

e3bc54b2
·
1 month ago
·
[ - ]

As someone who actually wrote a JSON to (limited) SQL transpiler at $DAYJOB, as much fun as I had designing and implementing that thing and for as many problems it solved immediately, 'tail wagging the dog' is the perfect description.

ljm
·
1 month ago
·
[ - ]

    SELECT email FROM users WHERE deleted_at IS NOT NULL OR status = 'active'

seems more semantic to me at first glance than piping this into a JSON->SQL library

    {
      "_select": "email",
      "_table": "users",
      "_where": { 
        "deleted_at": { "_is": { "_not": SQL_NULL_VALUE } },
        "_or": [
          { "status": "inactive" },
        ]
      }
    }

which is usually how these things end up looking.

IncreasePosts
·
1 month ago
·
[ - ]

You're right, it's a bit ridiculous. This is a perfect time to use xml instead of json.

meindnoch
·
1 month ago
·
[ - ]

Clearly the right solution is to use XML Object Notation, aka XON™!

JSON:

  {"foo": ["bar", 42]}

XON:

  <Object>
    <Property>
      <Key>foo</Key>
      <Value>
        <Array>
          <String>bar</String>
          <Number>42</Number>
        </Arra>
      </Value>
    </Property>
  </Object>

It gives you all the flexibility of JSON with the mature tooling of XML!

Edit: jesus christ, it actually exists https://sevenval.gitbook.io/flat/reference/templating/oxn

indymike
·
1 month ago
·
[ - ]

We had an IT guy who once bought an XML<->JSON server for $12,000. Very proud of his rack of "data appliances". It made XML like XON out of JSON and JSON that was a soup of elements attributes and ___content___, thus giving you the complexity of XML in JSON. I don't think it got used once by our dev team, and I'm pretty sure it never processed a byte of anything of value.

indymike
·
1 month ago
·
[ - ]

This may be the best comment on Hacker News ever.

sgarland
·
1 month ago
·
[ - ]

I think that honor still belongs to "Did you win the Putnam?" [0] but this is definitely still in the top 5.

[0]: https://news.ycombinator.com/item?id=35079

mritchie712
·
1 month ago
·
[ - ]

LLMs are far more reliable at producing something like this:

    {
      "dimensions": [
        "users.state",
        "users.city",
        "orders.status"
      ],
      "measures": [
        "orders.count"
      ],
      "filters": [
        {
          "member": "users.state",
          "operator": "notEquals",
          "values": ["us-wa"]
        }
      ],
      "timeDimensions": [
        {
          "dimension": "orders.created_at",
          "dateRange": ["2020-01-01", "2021-01-01"]
        }
      ],
      "limit": 10
    }

than this:

    SELECT
      users.state,
      users.city,
      orders.status,
      sum(orders.count)
    FROM orders
    CROSS JOIN users
    WHERE
      users.state != 'us-wa'
      AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
    GROUP BY 1, 2, 3
    LIMIT 10;

sgarland
·
1 month ago
·
[ - ]

This doesn't make sense.

From a schema standpoint, table `orders` presumably has a row per order, with columns like `user_id`, `status` (as you stated), `created_at` (same), etc. Why would there be a `count` column? What does that represent?

From a query standpoint, I'm not sure what this would accomplish. You want the cartesian product of `users` and `orders`, filtered to all states except Washington, and where the order was created in 2020? The only reason I can think of to use a CROSS JOIN would be if there is no logical link between the tables, but that doesn't make any sense for this, because users:orders should be a 1:M relationship. Orders don't place themselves.

I think what you might have meant would be:

    SELECT
      users.state,
      users.city,
      orders.status,
      COUNT(*)
    FROM users
    JOIN orders ON user.id = orders.user_id
    WHERE
      users.state != 'us-wa' AND
      orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
    GROUP BY 1, 2, 3
    LIMIT 10;

Though without an ORDER BY, this has no significant meaning, and is a random sampling at best.

Also, if you or anyone else is creating a schema like this, _please_ don't make this denormalized mess. `orders.status` is going to be extremely low cardinality, as is `users.state` (to a lesser extent), and `users.city` (to an even lesser extent, but still). Make separate lookup tables for `city` and/or `state` (you don't even need to worry about pre-populating these, you can use GeoNames[0]). For `status`, you could do the same, or turn them into native ENUM [1] if you'd like to save a lookup.

[0]: https://www.geonames.org

[1]: https://www.postgresql.org/docs/current/datatype-enum.html

Agraillo
·
1 month ago
·
[ - ]

The programming languages are more predictable than human. So the rules are much easier to be "compressed" after they're basically detected when fed with big data. Your two examples imho are easily interchangeable during follow-up conversation with a decent LLM. Tested this with the following prompt and fed a c fragment and an SQL-fragment, got in both cases something like your first one

> Please convert the following fragment of a programming language (auto-detect) into a json-like parsing information when language construct is represented like an object, fixed branches are represented like properties and iterative clauses (statement list for example) as array.

andndnndd
·
1 month ago
·
[ - ]

[flagged]

dangscientist
·
1 month ago
·
[ - ]

[flagged]

tclancy
·
1 month ago
·
[ - ]

Mother of God. I can write JSON instead of a language designed for querying. What is the advantage? If I’m going to move up an abstraction layer, why not give me natural language? Lots of things turn a limited natural language grammar into SQL for you. What is JSON going to {do: for: {me}}?

8n4vidtmkvmk
·
1 month ago
·
[ - ]

Sorry, I couldn't parse that. You didn't quote your keys

tclancy
·
1 month ago
·
[ - ]

Was on a phone. Also, there’s more invalid than that. Plus I am lazy.

Spivak
·
1 month ago
·
[ - ]

I find it funny people are making fun of this while every ORM builds up an object representing the query and then compiles it to SQL. SQL but as a data structure you can manipulate has thousands of implementations because it solves a real problem. This time it's because LLMs have an easier time outputting complex JSON than SQL itself.

ljm
·
1 month ago
·
[ - ]

Something still has to convert the JSON to SQL but in this case, who is writing the JSON? An LLM?

Sometimes, it's easier or more efficient to just learn the shit you're working with instead of spending 1000x the compute fobbing it off to OpenAI. Even just putting a blob of SQL in a heredoc and using a prepared statement to parameterise it is good enough.

Beyond that, query building is just part of the functionality an ORM provides. The main chunk of it is the mapping of DB-level data structures to app-level models and vice-versa, particularly in an OOP context.

ses1984
·
1 month ago
·
[ - ]

Any idea why that is?

meindnoch
·
1 month ago
·
[ - ]

>you get the added benefit of writing queries in JSON instead of raw SQL

^ kids, this is what AI-induced brainrot looks like.

fkyimeanit
·
1 month ago
·
[ - ]

>you get the added benefit of writing queries in JSON instead of raw SQL

You should have written your comment in JSON instead of raw English.

christophilus
·
1 month ago
·
[ - ]

A semantic layer would be great. It should be a structured layer designed to make relational queries easy to write. We could call it “structured data language” or maybe “structured query language”.

In all seriousness, I have some complaints about SQL (I think LINQ’s reordering of it is a good idea), but there’s no need to invent another layer on order for LLMs to be able to wrangle it.

cmrdporcupine
·
1 month ago
·
[ - ]

The semantic layer for database queries is (roughly) the relational algebra.

jinjin2
·
1 month ago
·
[ - ]

I agree that using a semantic layer is the best way to get better precision. It is almost like a cheatsheet for the AI.

But I would never use one that forced me to express my queries in JSON. The best implementations integrate right into the database so they become an integral part of regular your SQL queries, and as such also available to all your tools.

In my experience, from using the Exasol Semantic Layer, it can be a totally seamless experience.

galenmarchetti
·
1 month ago
·
[ - ]

still need someone to build the semantic layer, why not use text2sql or something similar for that

hakanito
·
1 month ago
·
[ - ]

The game changer for me will be when AI stops hallucinating SDK methods. I often find myself asking ”show me how to do advanced concept X in somewhat niche Y sdk”, and while it produces confident answers, 90% of the time it is suggesting SDK methods that do not exist, so a lot of time is wasted just arguing about that

M4v3R
·
1 month ago
·
[ - ]

The current method of solving this is providing the AI with the documentation of the SDKs your code uses. Current LLMs have quite big context windows so you can feed them a lot of documentation. Some tools can even crawl multipage documentation and index them for the use of LLMs.

hakanito
·
1 month ago
·
[ - ]

How do you do that practically/reliably? Would be great to just paste a link to the SDK Github repo, but doesn't seem to work (yet) in my experience

jstummbillig
·
1 month ago
·
[ - ]

Simple way would be to use either Sonnet 3.7/5 or Gemini 2.5 pro in windsurf/cursor/aider and tell it to search the web, when you know an SDK is problematic (usually because it's new and not in the training set).

That's all it takes to get reliably excellent results. It's not perfect, but, at this point, 90% hallucinations on normal SDK usage strongly suggests poor usage of what is the current state of the art.

beala
·
1 month ago
·
[ - ]

You can do this with Cursor. If your docs are at example.com, simply type "@example.com" and it will go read that webpage.

I've even had luck simply dropping an entire OpenAPI spec into my repo and adding the file to the context window (also using the @ command).

spariev
·
1 month ago
·
[ - ]

Where is also context7 mcp you could use, it does help sometimes

stuaxo
·
1 month ago
·
[ - ]

Bring LLMs they always hallucinate an API it would be great to have.

If you had something on the other side to hallucinate the API itself you could have a program that dreams itself into existence as you use it.

flir
·
1 month ago
·
[ - ]

"I think callTheExactMethodINeed() was a hallucination. Can you try again?"

Then it apologizes and gives the right answer. It's weird. We really need a new work for what they're doing, 'cos it ain't thinking.

Hilift
·
1 month ago
·
[ - ]

I wouldn't be surprised if some of these hallucinations were actual custom code samples in the same namespace that they have access to, but can't attribute due to IP issues.

danjc
·
1 month ago
·
[ - ]

The article comments "out of the box, LLMs are particularly good at tasks like creative writing" but I think this actually demonstrates the problem with the ai.

A writer won't think that they're good at creative writing. In fact, I'm pretty sure they'd think LLM's are terrible at creative writing.

In other words, to an expert in their field, they're not that good - at least not yet.

But to someone who is not an expert, they're unbelievably good - they're enabled to do something they had zero ability to do before.

randomNumber7
·
1 month ago
·
[ - ]

Yes, but why is then everyone on HN claiming LLMs can code on expert level?

danjc
·
1 month ago
·
[ - ]

Fast for hammering out boilerplate, great for understanding something you've never done before. Much less value for field-frontier or novel work.

__loam
·
1 month ago
·
[ - ]

I would posit that most people on hackernews are actually not that experienced.

jeltz
·
1 month ago
·
[ - ]

Because the people claiming so are actually bad at coding. I suspect a lot of them actually work in non-coding positions. And while I can for sure see how LLMs can be useful, they code at the level of a junior dev fresh out of college, if I am being generous.

Workaccount2
·
1 month ago
·
[ - ]

Most of the software I use feels like it is coded by a junior dev fresh out of college anyway. Slow, buggy, bloated, gobbles memory...

candiddevmike
·
1 month ago
·
[ - ]

Techbro astroturfing. You don't really see the same level of OMG AI on other forums like Reddit. Same thing happened with cryptocurrencies, HN was inundated with plugs for them and the same behavior was downvoted severely elsewhere.

AdrianB1
·
1 month ago
·
[ - ]

In real life I find using AI for SQL dangerous. It allows people that don't know what they do to write queries that can significantly impact servers. In my world databases are relatively big for most developers, but not huge.

Sometimes when I want to fine tune a query I am challenging AI to provide a better solution. I give it the already optimized query and I ask for better. I never got a better answer, sometimes because AI is hallucinating or because the changes that it proposes are not working in a way that is beneficial, it is like an idiot parrot is telling what it overheard in the brothel - good info if it is a war brothel frequented by enemy officers in 1916, but not these days.

awesome_dude
·
1 month ago
·
[ - ]

Mate, IME programmers who don't know what they are doing just do it anyways then look to blame someone/something else if things turn to custard.

AI is just increasing the frequency of things turning to custard :)

HideousKojima
·
1 month ago
·
[ - ]

AI is most effective as an accountability sink

strict9
·
1 month ago
·
[ - ]

It should never be at the point where some random person can impact a server.

That's what read replicas with read-only access are for. Production db servers should not be open to random queries and usage by people. That's only for the app to use.

sgarland
·
1 month ago
·
[ - ]

Unless you have a much more regimented code review process than anywhere I've seen, "a random person" can impact prod quite easily by introducing a bad query into the app. Since ORMs are rampant, it's probably heavily obfuscated to begin with, so they won't even see the raw SQL. At best, they'll have run it on stage, where the DB size is probably so tiny that its performance issues go unnoticed.

AdrianB1
·
1 month ago
·
[ - ]

How it should be and how it is, that depends on who is the decision maker. If the decision maker is a technical person, there is no gap, but in my case the decision maker is a non-technical manager with no competence to make such decisions, but that is the way the company is organized. So letting people use AI to dig through a 1 TB database is not a good idea, while not using AI prevents them to even try. Security by oblivion.

cheema33
·
1 month ago
·
[ - ]

> I give it the already optimized query and I ask for better. I never got a better answer..

This was my experience as well. However I have observed that things have been improving this regard. Newer LLMs do perform much better. And I suspect they will continue to get better over time.

cjbgkagh
·
1 month ago
·
[ - ]

I’ve been working on highly optimized code that heavily uses CPU intrinsics, a year ago no chance, 6 months ago a helpful reference, today it’s a good starting point. That is an insane pace of improvement.

scarface_74
·
1 month ago
·
[ - ]

> It allows people that don't know what they do to write queries that can significantly impact servers.

At least for the only OLAP DB I use often - Amazon Redshift - that’s a solved problem with Workload Management Queues. You can restrict those users ability to consume too many resources.

For queries that are used for OLTP, I usually try to keep those queries relatively simple. If there is a reason for read queries that consume resources , those go to read replicas when strong consistently isn’t required

ziml77
·
1 month ago
·
[ - ]

The strategy I've used with these people is to let them prototype with AI and then have them hand over their work to me where I can then make it significantly more efficient. The nice thing is that their poor performing version acts as a reference for validating the output of my queries.

insin
·
1 month ago
·
[ - ]

Is it too late to rescue the phrase "one-shotted" or is it already too far gone, like "AI" and "agent"?

troupo
·
1 month ago
·
[ - ]

For some reason I can't get the image of someone swinging back shots of vodka/tequila every time I see "one-shotted" out of my head

th0ma5
·
1 month ago
·
[ - ]

Reminds me the "crypto" name overloading. It is clear that fanboys are jealous of competence.

bob1029
·
1 month ago
·
[ - ]

For the problems where it would matter the most, these tools seem to help the least. The hardest problem domains don't have just one schema to worry about. They have hundreds. If you need to spin up a personal blog or todo list tracker, I have no doubt that Google, et. al. can take you exactly where you want to go.

galenmarchetti
·
1 month ago
·
[ - ]

and then add in ambiguity in the business terms / intention behind the query. still a big need for something like semantic layer or ontology to sit between business and at least right now that stuff hasn’t been automated away yet (it should be though)

mrtimo
·
1 month ago
·
[ - ]

Malloy [1] has a semantic layer [2]... and Model Context Protocol (MCP) support is being added through Publisher [3]. Something to keep an eye on. Seems like a great fit for LLMs.

[1] https://www.malloydata.dev/ [2] https://docs.malloydata.dev/documentation/user_guides/malloy... [3] https://github.com/malloydata/publisher

fourfun
·
1 month ago
·
[ - ]

Google may be getting AI to write good SQL, but they aren’t getting it to write good blog posts.

gizmodo59
·
1 month ago
·
[ - ]

The blog post lacks lots of details and sounds more of a marketing piece and “Try this!”. They did not release the evals, a very basic architecture flow which is not novel nor any real world benchmarks that says how it worked expect some vague statements. Must have been generated by Gemini

mark_l_watson
·
1 month ago
·
[ - ]

Nice! A little off topic but I spent years experimenting writing AI-like natural language wrappers for relational databases that would query meta data to get column names, etc. Peter Norvig, in doing a tech review for me for the second edition of my Java AI book made a comment that the NLP database example was much better than anything else in the book, so the code I sweated over off and on for years was probably pretty good, BUT!, compared to what you can build with LLMs today, my old NLP wrappers aren't good at all.

LLMs make some things that were difficult very easy now.

Good article!

rectang
·
1 month ago
·
[ - ]

> We will cover state-of-the-art [...] how we approach techniques that allows the system to offer virtually certified correct answers.

I don't need AI to generate perfect SQL, because I am never going to trust the output enough to copy/paste it — the risk of subtle semantic errors is too high, even if the code validates.

Instead, I find it helpful for AI to suggest approaches — after which I will manually craft the SQL, starting from scratch.

hsbauauvhabzb
·
1 month ago
·
[ - ]

Explain that to the average manager or junior engineer, both who don’t care about your desire to build well but not fast.

rectang
·
1 month ago
·
[ - ]

It’s not true that I want to build “well but not fast” — I’m trying to add value, and both speed and reliability matter. My productivity is high and I don’t have trouble articulating why; my approach has generally (though not universally) been well received by management and colleagues.

noosphr
·
1 month ago
·
[ - ]

> So now that we brought down prod for a day the new rule is no AI sql without three humans signing off on any queries.

Closi
·
1 month ago
·
[ - ]

If that’s the scenario, I would be asking why the testing pipeline didn’t catch this rather than why was the AI SQL wrong.

fkyimeanit
·
1 month ago
·
[ - ]

Because the testing pipeline was generated by AI, and code-reviewed by AI, reading a PR description generated by AI.

noosphr
·
1 month ago
·
[ - ]

Because the testing pipeline isn't the real database.

Anyone that knows a database well can bring it down with a innocent looking statement that no one else will blink at.

Closi
·
1 month ago
·
[ - ]

Sure, but everyone knows humans end up bringing down the database too by writing an innocent looking test query nobody else blinks at, which is why you end up needing a testing strategy for ANY SQL before YOLO'ing into prod.

dns_snek
·
1 month ago
·
[ - ]

To offer a 3rd option - what testing pipeline? Incompetent managers aren't going to approve of developers "wasting their time" on writing high quality tests.

hosel
·
1 month ago
·
[ - ]

Really? In my experience it’s been pretty good (using Pydantic)! I read over before I execute it, but it’s never done anything malicious.

rectang
·
1 month ago
·
[ - ]

I don't trust myself to craft a prompt in natural language which completely specifies my intent as codified with the precision of a programming language.

I also tend to turn to AI for advising me on difficult use cases, and most of the time it's for production code rather than one-offs. The easy cases, I just write myself because it's more mental effort to review code for subtle errors than it is to write it.

yahoozoo
·
1 month ago
·
[ - ]

What is the relevance of Pydantic with SQL?

·
1 month ago
·
[ - ]

·
1 month ago
·
[ - ]

paulddraper
·
1 month ago
·
[ - ]

Hopefully your trust in yourself is warranted

rectang
·
1 month ago
·
[ - ]

I embrace my fallibility, and enthusiastically pursue testing, code reviews, staging environments, and so on to minimize the mistakes that make it through to production.

It seems to me that this skeptical mindset is consonant with handling AI output with care.

auggierose
·
1 month ago
·
[ - ]

You'd rather trust in AI than yourself?

malthaus
·
1 month ago
·
[ - ]

in writing good sql code? i definitely would

ai is not going to replace the senior sql expert with 20 years of battle experience in the short-term but support me who last dug into sql 15 years ago and needs to get a working sql query in a project. and ai usually does a better job than me copy pasting googled code in between quickly browsing through tutorials.

pcblues
·
1 month ago
·
[ - ]

Can someone please answer these questions because I still think AI stinks of a false promise of determinable accuracy:

Do you need an expert to verify if the answer from AI is correct? How is it time saved refining prompts instead of SQL? Is it typing time? How can you know the results are correct if you aren't able to do it yourself? Why should a junior (sorcerer's apprentice) be trusted in charge of using AI? No matter the domain, from art to code to business rules, you still need an expert to verify the results. Would they (and their company) be in a better place to design a solution to a problem themselves, knowing their own assumptions? Or just check of a list of happy-path results without a FULL knowledge of the underlying design? This is not just a change from hand-crafting to line-production, it's a change from deterministic problem-solving to near-enough is good enough, sold as the new truth in problem-solving. It smells wrong.

lmeyerov
·
1 month ago
·
[ - ]

I can bring data here:

We recently did the first speed run where Louie.ai beat teams of professional cybersecurity analysts in an open competition, Splunk's annual Boss of the SOC. Think writing queries, wrangling Python, and scanning through 100+ log sources to answer frustratingly sloppy database questions:

- We get 100% correct for basic stuff in the first half that takes most people 5-15 minutes per question, and 50% correct in the second half that most people take 15-45+ minute per question, and most teams time out in that second half.

- ... Louie does a median 2-3min per question irrespective of the expected difficulty, so about 10X faster than a team of 5 (wall clock), and 30X less work (person hours). Louie isn't burnt out at the end ;-)

- This doesn't happen out-of-the-box with frontier models, including fancy reasoning ones. Likewise, letting the typical tool here burn tokens until it finds an answer would cost more than a new hire, which is why we measure as a speedrun vs deceptively uncapped auto-solve count.

- The frontier models DO have good intuition , understand many errors, and for popular languages, DO generate good text2query. We are generally happy with OpenAI for example, so it's more on how Louie and the operator uses it.

- We found we had to add in key context and strategies. You see a bit in Claude Code and Cursor, except those are quite generic, so would have failed as well. Intuitively in coding, you want to use types/lint/tests, and same but diff issues if you do database stuff. But there is a lot more, by domain, in my experience, and expecting tools to just work is unlikely, so having domain relevant patterns baked in and that you can extend is key, and so is learning loops.

A bit more on louie's speed run here: https://www.linkedin.com/posts/leo-meyerovich-09649219_genai...

This is our first attempt at the speed run. I expect Louie to improve: my answers represent the current floor, not the ceiling of where things are (dizzyingly) going. Happy to answer any other q's where data might help!

blibble
·
1 month ago
·
[ - ]

is a competition/speed run a realistic example?

lmeyerov
·
1 month ago
·
[ - ]

Splunk Boss of the SOC is the realistic test, it is one of the best cyber ranges. Think effectively 30+ hours of tricky querying across 100+ real log source types (tables) with a variety of recorded cyber incidents - OS logs, AWS logs, alerting systems, etc. As I mentioned, the AI has to seriously look at the data too, typically several queries deep for the right answer, and a lot of rabbit holes before then - answers can't just skate by on schema. I recommend folks look at the questions and decide for themselves what this signifies. I personally gained a lot of respect for the team create the competition.

The speed run formulation for all those same questions helps measure real-world quality vs cost trade-offs. I don't find uncapped solve rates to be relevant to most scenarios. If we allowed infinite time, yes we would have scored even higher... But if our users also ran it that way, it would bankrupt them.

If anyone is in the industry, there are surprisingly few open tests here. That is another part of why we did BOTS. IMO sunlight here brings progress, and I would love to chat with others on doing more open benchmarks!

BenderV
·
1 month ago
·
[ - ]

My 2 cents, building a tool in this space...

> Do you need an expert to verify if the answer from AI is correct?

If the underling data has a quality issue that is not obvious to a human, the AI will miss it too. Otherwise, the AI will correct it for you. But I would argue that it's highly probable that your expert would have missed it too... So, no, it's not a silver bullet yet, and the AI model often lacks enough context that humans have, and the capacity to take a step back.

> How is it time saved refining prompts instead of SQL?

I wouldn't call that "prompting". It's just a chat. I'm at least ~10x faster (for reasonable complex & interesting queries).

herrkanin
·
1 month ago
·
[ - ]

Same reason as why it's harder to solve a sudoku than it is to verify its correctness.

pcblues
·
1 month ago
·
[ - ]

I should have made my post clearer :)

There isn't one perfect solution to SQL queries against complex systems.

A suduko has one solution.

A reasonably well-optimised SQL solution is what the good use of SQL tries to achieve. And it can be the difference between a total lock-up and a fast running of a script that keeps the rest of a complex system from falling over.

raincole
·
1 month ago
·
[ - ]

The number of solutions doesn't matter though. You can easily design a sudoku game that has multiple solutions, but it's still easier to verify a given solution than to solve it from scratch.

It's not even about whether or not the number of solutions is limited. A math problem can have unlimited amount of proofs (if we allow arbitrarily long proofs), but it's still easier to verify one than to come up with one.

Of course writing SQL isn't necessarily comparable to sudoku. But the difference, in the context of verifiability, is definitely not "SQL has no single solution."

Workaccount2
·
1 month ago
·
[ - ]

If the current state of software is any indication, experts don't care much about optimization either.

neRok
·
1 month ago
·
[ - ]

I've recently started asking the free version of chat-gpt questions on how I might do various things, and it's working great for me - but also my questions come from a POV of having existing "domain knowledge".

So for example, I was mucking around with ffmpeg and mkv files, and instead of searching for the answer to my thought-bubble (which I doubt would have been "quick" or "productive" on google), I straight up asked it what I wanted to know;

  > are there any features for mkv files like what ffmpeg does when making mp4 files with the option `--movflags faststart`?

And it gave me a great answer!

  (...the answer happened to be based upon our prior conversation of av1 encoding, and so it told me about increasing the I-frame frequency).

Another example from today - I was trying to build mp4v2 but ran in to drama because I don't want to take the easy road and install all the programs needed to "build" (I've taken to doing my hobby-coding as if I'm on a corporate-PC without admin rights (windows)). I also don't know about "cmake" and stuff, but I went and downloaded the portable zip and moved the exe to my `%user-path%/tools/` folder, but it gave an error. I did a quick search, but the google results were grim, so I went to chat-gpt. I said;

  > I'm trying to build this project off github, but I don't have cmake installed because I can't, so I'm using a portable version. It's giving me this error though: [*error*]

And the aforementioned error was pretty generic, but chat-gpt still gave a fantastic response along the lines of;

  >  Ok, first off, you must not have all the files that cmake.exe needs in the same folder, so to fix do ..[stuff, including explicit powershell commands to set PATH variables, as I had told it I was using powershell before].
  >  And once cmake is fixed, you still need [this and that].
  >  For [this], and because you want portable, here's how to setup Ninja [...]
  >  For [that], and even though you said you dont want to install things, you might consider ..[MSVC instructions].
  >  If not, you can ..[mingw-w64 instructions].

neRok
·
1 month ago
·
[ - ]

[Going to give myself a self-reply here, but what-ev's. This is how I talk to chat-gpt, FYI]... So I happened to be shopping for a cheap used car recently, and we have these ~15 year old Ford SUV's in Aus that are comfortable, but heavy and thirsty. Also, they come in AWD and RWD versions. So I had a thought bubble about using an AWD "gearbox" in a RWD vehicle whilst connecting an electric motor to the AWD front "output", so that it could work as an assist. Here was my first question to chat-gpt about it;

  > I'm wondering if it would be beneficial to add an electric-assist motor to an existing petrol vehicle. There are some 2010 era SUV's that have relatively uneconomical petrol engines, which may be good candidates. That is because some of them are RWD, whilst some are AWD. The AWD gearbox and transfer case could be fitted to the RWD, leaving the transfers front "output" unconnected. Could an electric motor then be connected to this shaft, hence making it an input?

It gave a decent answer, but it was focused on the "front diff" and "front driveshaft" and stuff like that. It hadn't quite grasped what I was implying, although it knew what it was talking about! It brought up various things that I knew were relevant (the "domain knowledge" aspect), so I brought some of those things in my reply (like about the viscous coupling and torque split);

  > I mentioned the AWD gearbox+transfer into a RWD-only vehicle, thus keeping it RWD only. Thus both petrol+electric would be "driving" at the same time, but I imagine the electric would reduce the effort required from the petrol. The transfer case is a simple "differential" type, without any control or viscous couplings or anything - just simple gear ratio differences that normally torque-split 35% to the front and 65% to the rear. So I imagine the open-differential would handle the 2 different input speeds could "combine" to 1 output?

That was enough to "fix" its answer (see below). And IMO, it was a good answer!

I'm posting this because I read a thread on here yesterday/2-days-ago about people stuggling with their AI's context/conversation getting "poisoned" (their word). So whilst I don't use AI that much, I also haven't had issue with it, and maybe that's because of that way I converse with it?

---------

"Edit": Well, the conversation was too long for HN, so I put it here - https://gist.github.com/neRok00/53e97988e1a3e41f3a688a75fe3b...

·
1 month ago
·
[ - ]

tango12
·
1 month ago
·
[ - ]

What’s the eventual goal of text to sql?

Is it to build a copilot for a data analyst or to get business insight without going through an analyst?

If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.

These don’t seem like text2sql problems:

> Why did we hit only 80% of our daily ecommmerce transaction yesterday?

> Why is customer acquisition cost trending up?

> Why was the campaign in NYC worse than the same in SF?

phillipcarter
·
1 month ago
·
[ - ]

> These don’t seem like text2sql problems:

Correct, but I would propose two things to add to your analysis:

1. Natural language text is a universal input to LLM systems

2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions

And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.

There's clearly much work needed to achieve that goal.

galenmarchetti
·
1 month ago
·
[ - ]

yeah I agree with this - good text2sql is essential but just one part of a larger stack that will actually get there. Seems possible tho

mynegation
·
1 month ago
·
[ - ]

To be fair, these don’t look like SQL problems either. SQL answers “what”, not “why” questions. The goal of text2sql is to free up analyst time to get through “what” much faster and - possibly- focus on “why” questions.

cdavid
·
1 month ago
·
[ - ]

My observation is the latter, but I agree the results fall short of expectations. Business will often want last minute change in reporting, don't get what they want at the right time because lack of analysts, and hope having "infinite speed" will solve the problem.

But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.

Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.

richardw
·
1 month ago
·
[ - ]

Any algo that a human would follow can be built and tested. If you have 10 analysts you have 10 different skill levels, with differing understanding of the database and business context. So automation gives you a platform to achieve a floor of skill and knowledge. The humans can now be “at least this good or better”. A new analyst instantly gets better, faster.

I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.

Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.

layer8
·
1 month ago
·
[ - ]

But “text to sql” isn’t an algorithm.

richardw
·
1 month ago
·
[ - ]

The processes the people want the sql for are likely filled with algo’s. An exec wants info in a known domain, set up a text to sql system with lots of context and testing to generate queries. If they think they have something good, get an expert to test and productionise it.

“Thank you for your request. Can you walk me through the steps you’d use to do this manually? What things would you watch out for? What kind of number ranges are reasonable? I can propose an algorithm and you tell me if that’s correct. The admins have set up guidelines on how to reason about customer and purchase data. Is the following consistent with your expectations?”

layer8
·
1 month ago
·
[ - ]

This is the same fallacy as low-code/no-code. If you have to check a precise algorithm, you’re effectively coding, and you need a language with the same precision as a programming language.

richardw
·
1 month ago
·
[ - ]

Only if you want a production-ready output. To get execs able to self-feed enough, this works fine. Look, you don’t see value until it’s perfect. Good, other people do. I see your fallacy and raise you a false dichotomy.

layer8
·
1 month ago
·
[ - ]

The problem I see is how do you verify that the result of your text-to-sql is really what you were asking for, without understanding the SQL (or “the algorithm”)? It boils down to that you have to know what you are doing, and with the present state of art of AI we can’t have confidence in that.

richardw
·
1 month ago
·
[ - ]

I’m assuming exploratory work from the exec, not something they make decisions with or put into production. If you need something you can trust, you typically need a lot of checks, including multiple humans.

I play a weird part at work near AI. I use it all the time but I’m the first person to warn everyone that it’s absolutely not trustworthy. No matter your prompt, the data, the guidelines built into it, the output is fundamentally flaky. But I use it while knowing that and working around it. Making the process reliable is a big part of my focus, and usually that means minimising the part the LLM plays. Checks and balances live where things are predictable.

·
1 month ago
·
[ - ]

westurner
·
1 month ago
·
[ - ]

From "Show HN: We open sourced our entire text-to-SQL product" (2024) https://news.ycombinator.com/item?id=40456236 :

> awesome-Text2SQL: https://github.com/eosphoros-ai/Awesome-Text2SQL

> Awesome-code-llm > Benchmarks > Text to SQL: https://github.com/codefuse-ai/Awesome-Code-LLM#text-to-sql

zeroq
·
1 month ago
·
[ - ]

Every once in a while I've been trying AI, since everyone and their mother told me to, so I comply.

My recent endevour was with Gemini 2.5:

  - Write me a simple todo app on cloudflare with auth0 authentication.
  - Here's a simple todo on cloudflare. We import the @auth0-cloudflare and...
  - Does that @auth0-cloudflare exists?
  - Oh, it doesn't. I can give you a walkthrough on how to set up an account on auth0. Would you like me to?
  - Yes, please.
  - Here. I'm going to write the walkthrough in a document... (proceed to create an empty document)
  - That seems to be an empty document.
  - Oh, my bad. I'll produce it once more. (proceed to create another empty document)
  - Seems like you're md parsing library is broken, can you write it in chat instead?
  - Yes... (your gemini trial has expired, would you like to pay $100 to continue?)

karencarits
·
1 month ago
·
[ - ]

It's difficult to assess how typical your experience is; I tried your initial prompt (`Write me a simple todo app on cloudflare with auth0 authentication.` on gemini-2.5-pro-preview-05-06) and didn't get any mentions of @auth0-cloudfare, although I cannot verify if the answer is working as-is

https://pastebin.com/yfg0Zn0u

__loam
·
1 month ago
·
[ - ]

Shocked you got a different output from the stochastic token generator.

karencarits
·
1 month ago
·
[ - ]

That's not the point. While there is a temperature setting and randomness involved, you can still benchmark and experience significant differences in the output between models and generations. I thus provided more details and the full output to make it easier for people to assess the context of the comment I replied to

When someone uses the same tools as I do but seem to experience problems I do not have - these kind of posts often describes how bad LLMs are or how bad Google search is - I get a bit confused. Is it A/B testing going on? Am I just lucky? Am I inattentive to these weaknesses? Is it about promoting? Or what areas we work in? Do we actually use the same tools (i.e., same models)?

e3bc54b2
·
1 month ago
·
[ - ]

The worse part is not even being trolled at AI roundabout. The worse part is gaslighting by people who then go on to imply that I'm dumb to not be able to 'guide' the model 'towards the solution', whatever the fuck that means. And this is after telling me that model is so smart to just know what I want.

Claude and Gemini are pretty decent at providing a small and tight function definition with well defined parameters and output, but anything big and it starts losing shit left and right.

All vibecoding sessions I've seen have been pretty dead easy stuff with lot of boilerplate, maybe I'm weird for just not writing a lot of boilerplate and rely on well-built expressive abstractions..

floren
·
1 month ago
·
[ - ]

Remember, if AI couldn't solve your problem, you were probably using the wrong model. Did you try with o5-selfsuck-20250523-512B?

palmfacehn
·
1 month ago
·
[ - ]

My least favorite part of this trend is the ageism. "Crusty curmudgeons are not up-to date with the latest bloat if they think RTFM is still a thing", "Oh, you didn't like ORMs? Did you try letting an AI generate code for your ORM?"

Maybe in the future all of these assistants will offer something amazing, but in my experience, there is more time invested in prompting that just reading the relevant documentation and having a coherent design.

My suspicion is that many, (but not all please no flames) of the biggest boosters of AI coding are simply inexperienced. If this is true, it makes sense that they wouldn't recognize the numerous foot-guns in AI generated code.

jbeninger
·
1 month ago
·
[ - ]

As an experienced coder, I find ai invaluable for a ton of stuff, nearly none of it writing production code.

* Variable naming

* Summarizing unfamiliar code

* Producing boilerplate code when I have examples

* Producing one-liners when I've forgotten the parameter order or API specification. I double check, but this is basically a Google that directly answers your question

* Pre-code brainstorming

* Code review. Depending on the language it can catch classes of problems that escape linters

In my experience it won't produce production-ready code, but it's great as a rubber duck and a second pair of eyes.

kubb
·
1 month ago
·
[ - ]

It’s brilliant because you can always shift the blame on the user. Wrong prompt, wrong model, should have used an agent and ran 3 models in parallel, etc.

Meanwhile we get claims that the tools are as capable as a junior programmer, and CEOs believe that.

sensanaty
·
1 month ago
·
[ - ]

Yeah it's my favourite argument. Apparently this magical tool that can replace engineers and can do and write anything needs you to write prompts so detailed that you could have just written the damn code yourself, and probably had an easier time with it to boot.

The whole thing feels like we're in a collective delusion because idiotic managers and C-suites are blindly lapping up the advertising slop coming from the AI companies.

Kiro
·
1 month ago
·
[ - ]

You're the one doing the gaslighting now. "It doesn't work for me, therefore it can't possibly work for anyone else."

raincole
·
1 month ago
·
[ - ]

At this point you should just take this as your secret weapon. Let people convince each other that AI can't do that thing, while you are one-shotting the exact thing with a cost of $0.05.

codr7
·
1 month ago
·
[ - ]

Which is a very reasonable conclusion given the kinds of errors it makes.

Why are you so defensive about the tech?

Involved in any AI startups, perhaps?

·
1 month ago
·
[ - ]

rvz
·
1 month ago
·
[ - ]

Most of them are involved (including the latest round of YC startups), who have VCs invested in them to boost "AI agents" all over the internet and ignoring the laughable errors it makes.

Here are some headlines:

OpenAI explains why ChatGPT became too sycophantic [0]

Anthropic blames Claude AI for ‘embarrassing and unintentional mistake’ in legal filing [1]

xAI blames Grok’s obsession with white genocide on an ‘unauthorized modification’ [2]

So you see, these AI models can easily be prone to producing nonsensical errors unpredictably at any time.

[0] https://techcrunch.com/2025/04/29/openai-explains-why-chatgp...

[1] https://www.theverge.com/news/668315/anthropic-claude-legal-...

[2] https://techcrunch.com/2025/05/15/xai-blames-groks-obsession...

zxexz
·
1 month ago
·
[ - ]

I find Gemini excellent for sql. Wouldn’t consider myself an expert in many things, but in sql and database design id consider myself close. I like writing queries and doing the architecture, and that’s where it’s exceptionally helpful. The massive context length combined with pointed questions means i can just dump the entire DDL, and ask “what am i missing?”. It really is an excellent tool for helping with times like checks and catching dumb errors on complex databases.

dcrimp
·
1 month ago
·
[ - ]

I wonder if, for a given dialect (and even DDL), you could use that token masking technique similar to how that Structured Outputs [1] thing went:

Quote: "While sampling, after every token, our inference engine will determine which tokens are valid to be produced next based on the previously generated tokens and the rules within the grammar that indicate which tokens are valid next. We then use this list of tokens to mask the next sampling step, which effectively lowers the probability of invalid tokens to 0. Because we have preprocessed the schema, we can use a cached data structure to do this efficiently, with minimal latency overhead."

I.e. mask any tokens that would produce something that isn't valid SQL in the given dialect, or further, a valid query for the given schema. I assume some structured outputs capability is latent to most assistants nowadays, so they probably already have explored this

[1] https://openai.com/index/introducing-structured-outputs-in-t...

jamesblonde
·
1 month ago
·
[ - ]

LLMs are still not great at generating SQL. If Google had a breakthrough, it should be on the bird brain benchmark - (A Big Bench for Large-Scale Database Grounded Text-to-SQLs) https://bird-bench.github.io/

At the moment GCP are at 76%, humans are at 93%.

jgalt212
·
1 month ago
·
[ - ]

For me the flash model is way better than the pro model. I don't want to wait all the extra time to get some code back that I'm going to have to read and modify anyway. I much prefer getting a 92.5% right answer now, than 95% correct answer a minute or minutes from now.

squidbeak
·
1 month ago
·
[ - ]

Are those percentages real, or plucked out of the air? I find Pro's quality starkly higher

jgalt212
·
1 month ago
·
[ - ]

estimates my from uses. no api, just AI studio. < 100 prompts.

jgalt212
·
1 month ago
·
[ - ]

another metaphor I feel is apt. Flash is like a REPL, Pro is like a compiler.

deadbabe
·
1 month ago
·
[ - ]

All this LLM written SQL stuff sounds great until you realize if you don’t really know SQL you won’t be able to debug or fix any broken SQL an LLM generates.

Thus, this is mainly just a tool for true experts to do less work and still get paid the same, not a tool for beginners to rise to the level of experts.

roywiggins
·
1 month ago
·
[ - ]

It depends, sometimes just feeding back broken SQL with "that didn't return any rows, can you fix it" and it comes up with something that works. Or "you're looking at the wrong entity, look at this table instead" or whatever, without knowing how to write competent SQL.

Obviously being able to at least read a bit of SQL and understanding the basic idea of relational databases helps loads.

sgarland
·
1 month ago
·
[ - ]

> It depends, sometimes just feeding back broken SQL with "that didn't return any rows, can you fix it" and it comes up with something that works.

But how do you know if the SQL is correct, or just happened to return results that match for one particular case?

bdangubic
·
1 month ago
·
[ - ]

how do you know SQL is correct if you write it yourself or another teamate of yours?

sgarland
·
1 month ago
·
[ - ]

Because I know the language...?

bdangubic
·
1 month ago
·
[ - ]

if you know the language verify the llm output :)

harvey9
·
1 month ago
·
[ - ]

I'm not an expert but I've written SQL on and off for years. LLMs help me when I can describe my intent but can't think how to implement it. I don't expect a perfect solution just a starting point that I can refine.

bongodongobob
·
1 month ago
·
[ - ]

Have you not actually used LLMs? Just copy in the errors and away it goes.

deadbabe
·
1 month ago
·
[ - ]

Error goes away but it gives the wrong result.

dangus
·
1 month ago
·
[ - ]

> Even with a high-quality model, there is still some level of non-determinism or unpredictability involved in LLM-driven SQL generation. To address this we have found that non-AI approaches like query parsing or doing a dry run of the generated SQL complements model-based workflows well. We can get a clear, deterministic signal if the LLM has missed something crucial, which we then pass back to the model for a second pass. When provided an example of a mistake and some guidance, models can typically address what they got wrong.

Sounds like a bunch of bespoke not-AI work is being done to make up for LLM limitations that point blank can’t be resolved.

rawgabbit
·
1 month ago
·
[ - ]

Regarding the first issue: ” For example, even the best DBA in the world would not be able to write an accurate query to track shoe sales if they didn't know that cat_id2 = 'Footwear' in a pcat_extension table means that the product in question is a kind of shoe. The same is true for LLMs.”

I wish developers would make use of long table names and column names. For example, pcat_extension could have been named release_schema_1_0.product_category_extension. And cat_id2 could have been named category_id2.

antman
·
1 month ago
·
[ - ]

This is on howto to to write good SELECTS, not SQL. AI is good enough to also create schemas from spec, migrate, explore databases, testing etc which tgis article does not touch upon

TechDebtDevin
·
1 month ago
·
[ - ]

Every time I've fed more than 5 migration files and asked Claude to make multiple across those files it fails, it does very badly in almost all cases, even on kinda basic schemas. I actually don't think LLMs grok complex migration files or sql that well at all.

fooker
·
1 month ago
·
[ - ]

Well that's a great startup idea if you're familiar with the domain.

navaed01
·
1 month ago
·
[ - ]

“Metrics: We combine user metrics and offline eval metrics, and employ both human and automated evaluation, particularly using LLM-as-a-judge techniques”.

I’m curious to know what people are doing to measure whether the customer got what they were looking for. Thumbs up/down seems insufficient to me.

The ability of the LLM to perform purely depends on having good knowledge of what is going to get asked and how, which is more complex than it sounds

What techniques are people having success with?

edmundsauto
·
1 month ago
·
[ - ]

Training a 2nd agent as a qualitative evaluator works pretty well "LLM-as-a-judge". You train it with labeled critiques from experts, iterate a few times, then point it to your ground truth human-labelled-data ("golden dataset"). The quantitative output metric is human2ai alignment on the golden dataset, mix that with some expert judgment about the critique output by the ai as well.

Works pretty well for me, where you can typically get within the range of human2human variance.

tmpz22
·
1 month ago
·
[ - ]

Is it me or is the grammar of this article really poor:

> If the user is a technical analyst or a developer asking a vague question, giving them a reasonable, but perhaps not 100% correct SQL query is a good starting point

> Out of the box, LLMs are particularly good at tasks like creative writing, summarizing or extracting information from documents.

I don't -think- this was written by an LLM, but it really pulls me out of the technical article.

neuroelectron
·
1 month ago
·
[ - ]

No mention of knowing anything about the tables, versions or relational structure? Are we just assuming that's already given to the AI?

stefap2
·
1 month ago
·
[ - ]

I have done this using the OpenAI 4o model. I had to pass in a prompt with business-specific instructions, industry jargon, and descriptions of tables, including foreign keys. Then it would generate even complex join queries and return data. In my case, I was more interested in providing results to users not knowledgeable about SQL, but the SQL was displayed for information.

LAC-Tech
·
1 month ago
·
[ - ]

If LLMs are so wonderful we can just read from B+ Tree storage engines directly. SQL, ORMs, Query Planners... all bloat.

bool3max
·
1 month ago
·
[ - ]

great point

IncreasePosts
·
1 month ago
·
[ - ]

Can't believe I'm seeing something from Google involving shoes but it isn't named gShoe.

JodieBenitez
·
1 month ago
·
[ - ]

Wait... people need AI to write SQL ?

jeanloolz
·
1 month ago
·
[ - ]

A junior in SQL would need AI to write things they're not sure about, the same way stackoverflow has helped us for many many years before AI. A senior in sql, and in fact any languages, would use AI to be accelerated (I know I do).

JodieBenitez
·
1 month ago
·
[ - ]

I see this comparison too often and I don't think it's fair. Stackoverflow has peer review.

jeanloolz
·
1 month ago
·
[ - ]

It's a fair statement. Good point.

randomNumber7
·
1 month ago
·
[ - ]

Most people here have not understood the relational model, so yes.

criddell
·
1 month ago
·
[ - ]

I’ve read most of Codd’s book on the subject and have written SQL on and off since the mid 90’s and I still need to look up the differences between the various joins anytime I use them.

randomNumber7
·
1 month ago
·
[ - ]

Ok, but you did get the general concept of why it is better than a hierarchical file storage system.

I also assume you dont try to throw mongoDB on every problem where a SQL database is clearly superior.

aatd86
·
1 month ago
·
[ - ]

no.but if I ask for a report in natural language, the AI needs to be able to write sql.

msvana
·
1 month ago
·
[ - ]

Problem no. 2 (Understanding user intent) is relevant not only to writing SQL but also to software development in general. Follow-up questions are something I had in mind for a long time. I wonder why this is not the default for LLMs.

pcblues
·
1 month ago
·
[ - ]

A question: Does anyone know how well AI does generating performative SQL in years-old production databases? In terms of speed of execution, locking, accuracy, etc.?

I see the promise for green-field projects.

sgarland
·
1 month ago
·
[ - ]

It's very hit or miss. Claude does OK-ish, others less so. You have to explicitly state the DB and version, otherwise it will assume you have access to functions / features that may not exist. Even then, they'll often make subtle mistakes that you're not going to catch unless you already have good knowledge of your RDBMS. For example, at my work we're currently doing query review, and devs have created an AI recommendation script to aid in this. It recommended that we create a composite index on something like `(user_id, id)` for a query. We have MySQL. If you don't know (the AI didn't, clearly), MySQL implicitly has a copy of the PK in every secondary index, so while it would quite happily make that index for you, it would end up being `(user_id, id, id)` and would thus be 3x the size it needed to be.

iddan
·
1 month ago
·
[ - ]

For anybody wanting to use best-in-class AI SQL, I highly recommend checking out Sherloq (W23): https://www.sherloqdata.io/

todotask2
·
1 month ago
·
[ - ]

Those days, we have many types of database tools—ORMs, query builders, and more. AI can help reduce the complexity and avoid lock-in to a specific tech stack. I love to write raw SQL.

sgarland
·
1 month ago
·
[ - ]

Given that their first query example has a leading wildcard as a predicate (WHERE p.product_name LIKE '%shoe%') and doesn't take case into account, I have doubts.

rrrrrrrrrrrryan
·
1 month ago
·
[ - ]

Yeah there's still a long way to go. Until these things actually try to consistently spit out SARGable queries, look at the query plans, check for covering indexes, etc. they're going to write worse queries than an entry level data engineer.

I'm certain they'll get there soon, they're just not there yet.

treebeard901
·
1 month ago
·
[ - ]

If a lot of the value in a company is the software and over time a handful of AI companies start writing all the software, who really ends up owning all the value of the company?

wheelerwj
·
1 month ago
·
[ - ]

That’s easy. None of the value is in the software. The only value is in customers that use the software.

treebeard901
·
1 month ago
·
[ - ]

So again, no software, no customers, no value.

Those who provide AI to create the software will eventually take the customers directly. At that point, many existing companies of all kinds will just sort of be an unncessary intermediary.

It's not too different from how Amazon Basics monitored third party products sold on their site and eventually created a lower cost, often better product, to compete with them. Ultimately stealing their customers.

mousetree
·
1 month ago
·
[ - ]

Out of all the AI tools and models I’ve tried, the most disappointing is the Gemini built into BigQuery. Despite having well named columns with good descriptions it consistently gets nowhere close to solving the problem.

flysand7
·
1 month ago
·
[ - ]

Having written more SQL than any other programming language by now, every time I've tried to use AI to write the query for me, I'd spend way more time getting the output right than if I'd just written it myself.

As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like

    credit.CLIENT->NAME

And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad.

emptysea
·
1 month ago
·
[ - ]

That's one of the features of EdgeQL:

    select Movie {
      id,
      title,
      actors: {
        name
      }
    };

https://docs.geldata.com/learn/edgeql#select-objects

Although I think good enough language server / IDE could automatically insert the join when you typed `credit.CLIENT->NAME`

emmelaich
·
1 month ago
·
[ - ]

I'd like there to be a function or macro for a bunch of joins, say

    DEFINE products_by_order AS orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id

You could make it visible to the DB rather than just a macro so it could optimise it by caching etc. Sort of like a view but on demand.

icedchai
·
1 month ago
·
[ - ]

Sounds like a CTE?

efromvt
·
1 month ago
·
[ - ]

(Shameless plug) writing the same joins over and over (and refactoring when you update stuff) was one of my biggest boilerplate annoyances with SQL - I’ve tried to fix that while still keeping the rest of SQL in https://trilogydata.dev/

aradox66
·
1 month ago
·
[ - ]

Can't recommend dbt highly enough to get over the SQL boilerplate problem. After using dbt for a few years it's unimaginable to work without it.

johnthescott
·
1 month ago
·
[ - ]

In PostgreSQL i tend to use NATURAL JOINs in the FROM clause to whittle down the distracting joins in the more relavent WHERE clause.

galenmarchetti
·
1 month ago
·
[ - ]

yeah we’re doing something similar under the hood at AstroBee. it’s way way way easier to handle joins this way.

imo any hope of really leveraging llms in this context needs this + human review on additions to a shared ontology/semantic layer so most of the nuanced stuff is expressed simply and reviewed by engineering before business goes wild with it

quantadev
·
1 month ago
·
[ - ]

Having proper constraints and foreign keys that are clear is generally all that's needed in my experience. Are you sure your tables have well defined constraints, so that the AI can be absolutely 100% sure how everything links up? SQL is very precise, but only if you're utilizing constraints and foreign key definitions well.

carderne
·
1 month ago
·
[ - ]

It’s BigQuery, so it likely won’t have any of these.

quantadev
·
1 month ago
·
[ - ]

BigQuery supports all those SQL things I mentioned.

carderne
·
1 month ago
·
[ - ]

I’m just saying it’s likely they aren’t using them. But clearly you should if you want LLMs to do anything useful.

benjbrooks
·
1 month ago
·
[ - ]

o3 has yet to fail me on complex, multi-table queries. Not a fan of BigQuery’s Gemini integration.

nashashmi
·
1 month ago
·
[ - ]

AI text to regex solutions would be incredibly handy.

RadiozRadioz
·
1 month ago
·
[ - ]

This comment appears frequently and always surprises me. Do people just... not know regex? It seems so foreign to me.

It's not like it's some obscure thing, it's absolutely ubiquitous.

Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.

It takes far longer to tell an AI what you want than to write a regex yourself.

simonw
·
1 month ago
·
[ - ]

"It takes far longer to tell an AI what you want than to write a regex yourself."

My experience is the exact opposite. Writing anything but the simplest regex by hand still takes me significant time, and I've been using them for decades.

Getting an LLM to spit out a regex is so much less work. Especially since an LLM already knows the details of the different potential dialects of regex.

I use them to write regexes in PostgreSQL, Python, JavaScript, ripgrep... they've turned writing a regex from something I expect to involve a bunch of documentation diving to something I'll do on a whim.

Here's a recent example - my prompt included a copy of a PostgreSQL schema and these instructions:

  Write me a SQL query to extract
  all of my images and their alt
  tags using regular expressions.
  In HTML documents it should look
  for either <img .* src="..." .*
  alt="..." or <img alt="..." .*
  src="..." (images may be self-
  closing XHTML style in some 
  places). In Markdown they will
  always be ![alt text](url)

I ended up with 100 lines of SQL: https://gist.github.com/simonw/5b44a662354e124e33cc1d4704cdb...

The markdown portion of that is a good example of the kind of regex I don't enjoy writing by hand, due to the need to remember exactly which characters to escape and how:

  (REGEXP_MATCHES(commentary,
  '!\[([^\]]*)\]\(([^)]*)\)', 'g'))[2] AS src,
  (REGEXP_MATCHES(commentary,
  '!\[([^\]]*)\]\(([^)]*)\)', 'g'))[1] AS alt_text

Full prompt and notes here: https://simonwillison.net/2025/Apr/28/dashboard-alt-text/

RadiozRadioz
·
1 month ago
·
[ - ]

Perhaps Perl has given me Stockholm Syndrome, but when I look at your escaped regex example, it's extremely natural for me. In fact, I'd say it's a little too simple, because the LLM forgot to exclude unnecessary whitespace:

  (REGEXP_MATCHES(commentary,
  '!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[2] AS src,
  (REGEXP_MATCHES(commentary,
  '!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[1] AS alt_text

That is just nitpicking a one-off example though, I understand your wider point.

I appreciate the LLM is useful for problems outside one's usual scope of comfort. I'm mainly saying that I think it's a skill where the "time economics" really are in favor of learning it and expanding your scope. As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly. Certainly not the case for all skills, but I truly believe regex is one of them due to its small scope and ubiquitous application. The LLM can be used for the remaining 10% of really complicated cases.

As you've been using regex for decades, there is already a large subset of problems where you're faster than the LLM. So that problem space exists, it's all about how to tune learning time to right-size it for the frequency the problems are encountered. Regex, I think, is simple enough & frequent enough where that works very well.

simonw
·
1 month ago
·
[ - ]

> As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly.

It doesn't matter how fast I get at regex, I still won't be able to type any but the shortest (<5 characters) patterns out quicker than an LLM can. They are typing assistants that can make really good guesses about my vaguely worded intent.

As for learning deficit: I am learning so much more thanks to heavy use of LLMs!

Prior to LLMs the idea of using a 100 line PostgreSQL query with embedded regex to answer a mild curiosity about my use of alt text would have finished at the idea stage: that's not a high value enough problem for me to invest more than a couple of minutes, so I would not have done it at all.

Agraillo
·
1 month ago
·
[ - ]

Good points. Also looking at your original example I noticed that not only humans can explain regularities they expect in many different ways (also correcting along the way), they can basically ask LLM to base the result on a reference. In your example you provided a template with an img tag and brackets having different attributes patterns. But one can also just ask for a html-style tag. As I did with the "Please create a regex for extracting image file names when in a text a html-style tag img is met" (not posting it here, but "src" is clearly visible in the answer). So the "knowledge" from other domains is applied to the regex creation.

eddd-ddde
·
1 month ago
·
[ - ]

I know regex. But I use it so sparingly that every time I need it I forgot again the character for word boundary, or the character for whitespace, or the exact incantation for negative lookahead. Is it >!? who knows.

A shortcut to type in natural language and get something I can validate in seconds is really useful.

layer8
·
1 month ago
·
[ - ]

How do you validate it if you don’t know the syntax? Or are you saying that looking up syntax –> semantics is significantly quicker than semantics –> syntax? Which I don’t find to be the case. What takes time is grokking the semantics in context, which you have to do in both cases.

eddd-ddde
·
1 month ago
·
[ - ]

In my case most of my regex is for utility scripts for text processing. That means that I just run the script, and if it does what I want it to do I know I'm done.

LLMs have been amazing in my experience putting together awk scripts or shell scripts in general. I've also discovered many more tools and features I wouldn't have otherwise.

tough
·
1 month ago
·
[ - ]

https://regex101.com/

layer8
·
1 month ago
·
[ - ]

That doesn’t answer the question. By “validate”, I mean “prove to yourself that the regular expression is correct”. Much like with program code, you can’t do that by only testing it. You need to understand what the expression actually says.

widdershins
·
1 month ago
·
[ - ]

Testing something is the best way to prove that it behaves correctly in all the cases you can think of. Relying on your own (fallible) understanding is dangerous.

Of course, there may be cases you didn't think of where it behaves incorrectly. But if that's true, you're just as likely to forget those cases when studying the expression to see "what it actually says". If you have tests, fixing a broken case (once you discover it) is easy to do without breaking the existing cases you care about.

So for me, getting an AI to write a regex, and writing some tests for it (possibly with AI help) is a reasonable way to work.

layer8
·
1 month ago
·
[ - ]

I don’t believe this is true. That’s why we do mathematical proofs, instead of only testing all the cases one can think of. It’s important to sanity-check one’s understanding with tests, but mere black-box testing is no substitute for the understanding.

tough
·
1 month ago
·
[ - ]

Code is not perfect like math imho

libraries some times make weird choices

in theory theory and practice are the same, in practice not really

in the context of regex, you have to know which dialect and programming language version of regex you’re targeting for example. its not really universal how all libs/languages works

thus the need to test

marcosdumay
·
1 month ago
·
[ - ]

Notice that site has a very usable reference list you can consult for all those details the GP forgets.

CuriouslyC
·
1 month ago
·
[ - ]

I was using perl in the late 90s for sysadmin stuff, have written web scrapers in python and have a solid history with regex. That being said, AI can still write really complex lookback/lookahead/nested extraction code MUCH faster and with fewer bugs than me, because regex is easy to make small mistakes with even when proficient.

nevf1
·
1 month ago
·
[ - ]

I respectfully disagree. Thankfully, I don't need to write regex much, so when I do it's always like it's the first time. I don't find the syntax particularly intuitive and I always rely on web-based or third party tools to validate my regex.

Whenever I have worked on code smells (performance issues, fuzzy test fails etc), regex was 3rd only to poorly written SQL queries, and/or network latency.

All-in-all, not a good experience for me. Regex is the one task that I almost entirely rely on GitHub Copilot in the 3-4 times a year I have to.

insin
·
1 month ago
·
[ - ]

IME it's not just longer, but also more difficult to tell the LLM precisely what you want than to write it yourself if you need a somewhat novel RegExp, which won't be all over the training data.

I needed one to do something with Markdown which was a very internal BigCo thing to need to do, something I'd never have written without weird requirements in play. It wasn't that tricky, but going back trying to get LLMs to replicate it after the fact from the same description I was working from, they were hopeless. I need to dig that out again and try it on the latest models.

emmelaich
·
1 month ago
·
[ - ]

There's often a bunch of edge cases that people overlook. And you also get quadratic behaviour for some fairly 'simple' looking regexes that few people seem aware of.

nashashmi
·
1 month ago
·
[ - ]

I use regex as an alternative to wildcards in various apps like notepad++ and vscode. The format is different in each app. And the syntax is somewhat different. I have to research it each time. And complex regex is a nightmare.

Which is why I would ask an AI to build it if it could.

crystal_revenge
·
1 month ago
·
[ - ]

I personally didn’t really understand how to write regex until I understood “regular languages” properly, then it was obvious.

I’ve found that the vast majority of programmers today do not have any foundation in formal languages and/or the theory of computation (something that 10 years ago was pretty common to assume).

It used to be pretty safe to assume that everyone from perl hackers to computer science theorists understood regex pretty well, but I’ve found it’s increasingly a rare skill. While it used to be common for all programmers to understand these things, even people with a CS background view that as some annoying course they forgot as soon as the exam was over.

jacob019
·
1 month ago
·
[ - ]

The first languge I used to solve real problems was perl, where regex is a first class citizen. In python less so, most of my python scripts don't use it. I love regex but know several developers who avoid it like plague. You don't know what you don't know, and there's nothing wrong with that. LLM's are super helpful for getting up to speed on stuff.

fooker
·
1 month ago
·
[ - ]

Regex, especially non standard (and non regular) extensions can be pretty tricky to grok.

http://alf.nu/RegexGolf?world=regex&level=r00

RadiozRadioz
·
1 month ago
·
[ - ]

/foo/

took me 25.75 seconds, including learning how the website worked. I actually solved it in ~15 seconds, but I hadn't realized I got the correct answer becuase it was far too simple.

This website is much better https://regexcrossword.com/challenges/experienced/puzzles/e9...

fooker
·
1 month ago
·
[ - ]

Great, except you misunderstood the problem and wrote the exact opposite solution here.

Also this is the easiest starter puzzle, once you solve it you can click through to the next ones with increasing difficulty.

tough
·
1 month ago
·
[ - ]

Its something you use so sparingly far away usually that never sticks around

skydhash
·
1 month ago
·
[ - ]

A cheat sheet is just a web search away.

jimbokun
·
1 month ago
·
[ - ]

So is an LLM.

DonHopkins
·
1 month ago
·
[ - ]

So is a real html parser.

https://blog.codinghorror.com/parsing-html-the-cthulhu-way/

https://en.wikipedia.org/wiki/Beautiful_Soup_(HTML_parser)

vivzkestrel
·
1 month ago
·
[ - ]

since you know so much regex, why dont you write a regex html parser /s

fkyimeanit
·
1 month ago
·
[ - ]

"Text to SQL", "text to regex", "text to shell", etc. will never fundamentally work because the reason we have computer languages is to express specific requirements with no ambiguity.

With an AI prompt you'll have to do the same thing, just more verbosely.

You will have to do what every programmer hates, write a full formal specification in English.

DonHopkins
·
1 month ago
·
[ - ]

Oh, so an AI assisted number of problems increaser?

https://blog.codinghorror.com/regular-expressions-now-you-ha...

https://blog.codinghorror.com/parsing-html-the-cthulhu-way/

https://en.wikiquote.org/wiki/Jamie_Zawinski

sgarland
·
1 month ago
·
[ - ]

Can't wait for another regex-induced massive outage [0].

[0]: https://blog.cloudflare.com/details-of-the-cloudflare-outage...

cloudking
·
1 month ago
·
[ - ]

This is pretty simple in any foundation model, provide a well commented schema and ask for the query

tibbar
·
1 month ago
·
[ - ]

Step 1: Your schema has thousands of tables and there aren't many comments.

Step 2...

john2x
·
1 month ago
·
[ - ]

Use AI to generate the comments of course

cloudking
·
1 month ago
·
[ - ]

Exactly, add any documentation you have about the app for more context too.

fsndz
·
1 month ago
·
[ - ]

the smolagents library is also pretty nice to do the scaffolding around the model. Text to sql seems simple in demos, but to make it work in real life complex cases is very hard: https://medium.com/thoughts-on-machine-learning/build-a-text...

galenmarchetti
·
1 month ago
·
[ - ]

there’s two kinds of people using AI to generate SQL…those who say it’s already solved and those who say it’ll be impossible to ever solve

cloudking
·
1 month ago
·
[ - ]

Yep, probably why I got upvoted then downvoted. I've been using LLMs to write SQL queries for 2 years.

quantadev
·
1 month ago
·
[ - ]

I agree. There's really no magic to it any more. The table create DDL commands are a very precise description of the tables, so almost nothing more is ever needed. You can just describe in detail what query you need, and any decent LLM can do it just fine.

BohuTANG
·
1 month ago
·
[ - ]

[dead]

leelou2
·
1 month ago
·
[ - ]

[dead]

curtisszmania
·
1 month ago
·
[ - ]

[dead]

gitroom
·
1 month ago
·
[ - ]

[dead]

getgalaxy
·
1 month ago
·
[ - ]

[dead]

gerdesj
·
1 month ago
·
[ - ]

[flagged]

josephg
·
1 month ago
·
[ - ]

If you know SQL, then yeah! But if you don't know SQL, using an AI to write a few queries & debug them is a great way to learn it.

I'm pretty comfortable with sql but still found it a fabulous tool recently. I have a sql database which describes a tree of some ~600k events. Each event is in a session (via session_id). Most events have a parent event - and trees of events can involve multiple sessions.

I wanted to add two derived columns to my events table. For each event, I wanted to name the root event for that event's tree and the root event within this session. I had code in typescript to do it - but unsurprisingly it was pretty slow. Well, it turns out you can write a recursive SQL query which can traverse the graph and populate those columns. I had no idea that was even possible.

ChatGPT managed it pretty well - though I ended up making a bunch of tweaks to the query it suggested to simplify it. I learned a bunch of SQL in the process - and that was cool! Obviously I could have read the SQL documentation and figured it out myself, but it was faster & easier using chatgpt. Writing SQL queries is a fantastic use case for LLMs.

·
1 month ago
·
[ - ]