Do you make the environments on demand or do you make them preemptively so that one is ready to go the moment that it is needed?
If you make them on demand, have you tested ZFS snapshots to see if it can be done even faster using zfs clone?
We actually use gVisor (as stated in the article) and it has a very nifty feature called checkpoint_restore (https://gvisor.dev/docs/user_guide/checkpoint_restore/) which lets us start up sandboxes extremely efficiently. Then the filesystem is just a CoW overlay.
It's a filesystem, to put it simply.
Furthermore, ZFS ARC should treat each read operation of the same files as reading the same thing, while a sandbox made the traditional way would treat the files as unique, since they would be full copies of each other rather than references. ZFS on the other hand should only need to keep a single copy of the files cached for all environments. This reduces memory requirements dramatically. Unfortunately, the driver has double caching on mmap()’ed reads, but the duplication will only be on the actual files accessed and the copies will be from memory rather than disk. A modified driver (e.g. OSv style) would be able to eliminate the double caching for mmap’ed reads, but that is a future enhancement.
In any case, ZFS clones should have clear advantages over the more obvious way of extracting a tarball every time you need to make a new sandbox for a Python execution environment.
That said, if you really want to use block devices, you could use zvols to get something similar to LVM2 out of ZFS, but it is not as good as using snapshots on ZFS' filesystems. The write amplification would be lower by default (8KB versus 4MB). The page cache would still duplicate data, but the buffer cache duplication should be bypassed if I recall correctly.
In any case, I share your excitement.
The "run" option for generated code was removed due to underutilization, but the sandbox is still used for things like the data analysis workflow and running extensions amongst other things. It's really just a general purpose sandbox for running untrusted code server-side.
One question I always had was what the user "grte" stands for...
Btw. here the tricks I used back then to scrape the file system:
https://embracethered.com/blog/posts/2024/exploring-google-b...
This decoupling of system libraries from the OS itself is necessary because it otherwise becomes unmanageable to ensure "google3 binaries" remain runnable on both workstations and production servers. Workstations and servers each have their own Linux distributions, and each also needs to change over time.
You submitted this.
"Please use the original title, unless it is misleading or linkbait; don't editorialize." - https://news.ycombinator.com/newsguidelines.html
I've done that now (https://news.ycombinator.com/item?id=43509103).
I appreciate your scruples though! Because even though you would have been on the right side of HN's rules to correct a misleading (and/or linkbait) title, the fact that you work for Google would have opened you to the usual gotcha attacks about conflict of interest. This way we avoided all of that, and it's still a good submission and thread!
I am not aware of such rule's existence.
Also "should" not "must."
To be clear: I don't have a problem with you submitting this, but the title appears to be completely false.
> Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.
Arguably this is misleading or clickbait, but safer to err on the side of using the original title.
Currently, Google has the most cost effective model (Flash 2) for tons of corporate work (OCR, classifiers, etc).
They just announced likely the most capable model currently in the market with Gemini 2.5.
Their small open source models (Gemma 3) are very good.
It is true that they've struggled to execute on product, but the actual technology is very good and getting substantial adoption in industry. Personally I've moved quite a few workloads to Google from OpenAI and Anthropic.
My main complaint is that they often release impressive models, but gimp them in experimental mode for too long, without fully releasing them (2.5 is currently in this category).
As far as I can see, there is a mix of frustration at the slowness of launching, optimism/excitement that there are some really awesome things cooking, and indifference from a lot of people who think AI/LLMs as a product category are quite overhyped.
But the UX and general functionality of their apps and services has been in steep decline for a long time now, imo. There are thousands of examples of the most basic and obvious mistakes and completely uninspired, sloppy software and service design.
That's something you can work on to improve.
A few years back I wanted to work for FAANG big company. Now I don't after working for smaller but with 'big' management. There are rats races, dirty tricks. And engineers don't have much control on what and how they are doing. Many things decided by incompetent managers. Architect position is actually a manager's title, no brain or skills required.
Today I rather go to a small company or startup where the results are visible and appreciated.
But why? When they have so much management now and have just gotten so big that it'd probably be impossible to get anything done.
Profit is what matters though, not number of products. The consumer perception is that Search rakes in the largest profits, so if they lose that, it doesn't matter what else is there. Thoughts?
What casts it as most capable?
(I mean it will tell you it's set a timer but it doesn't talk to the native clock app so nothing ever goes off if you navigate away from the window.)
It can also play music or turn on my smart lamps, change their colors etc. I can't remember doing any special configuration for it to do that either.
Pixel 9 pro
And you used to be able to say "Find my phone" and it would chime and max screen brightness until found. Tried that with Gemini once, and it went on with very detailed instructions on using Google or Apple's Find My Device website (depending on what type of phone I owned), maybe calling it from another device if it's not silenced, or perhaps accepting that my device was lost or stolen if none of the above worked. Did find it during that lengthy attempt at being helpful though.
Another fun example, weather. When Gemini's in control, "What's the weather like tonight?" gets a short ramble about how weather depends on climate, with some examples of what the weather might be like broadly in Canada, Japan, or the United States at night.
Unlike Assistant where you could learn to adapt to its unique phrasing preferences, you just flat out can never reliably predict what Gemini's going to do. In exchange for higher peak performance, the floor dropped out the bottom.
Setting timers reminders, calendar events. Nothing. If they kill the assistant, I'll go Apple, no matter how much I hate it.
It's mostly useful for tracking what Python packages are available (and what versions): https://github.com/simonw/scrape-openai-code-interpreter/blo...
But not, secrecy for the sake of secrecy.
More likely just noone has taken the time and effort to do it.
It's a bit absurd that the best available documentation for that feature exists in my hacky scraped GitHub repository.
I don't think they're all that confidential if they're all on github: https://github.com/ezequielpereira/GAE-RCE/tree/master/proto...
It's not a valid point of criticism. The escape did not in fact "result" in the leak of confidential photos. That already happened somewhere else. This only resulted in the republishing of something already public.
Or another way, it's not merely that they were already public elsewhere, the imortant point is that the photos were not given to the ai in confidence, and so re-publishing them did not violate a confidence, any more than say github did.
I'm no ai apologist btw. I say all of these ais are committing mass copyright violation a million times a second all day every day since years ago now.
I was saying that the article was wrong for saying that, but I was half wrong about that.
I thought that the thing they were talking about was something that the AI got from a public source, in which case the AI didn't disclose anything it was given in confidense. It just republished something that it itself got from a public source in the first place.
Except I think I had that wrong. The stuff was already published elsewhere, but that's not how the AI got it. The leak caused the AI to disclose some of it's own internal workings, which is actually a leak and does "result in the disclosure of something confidential" even of something else elsewhere had already also seperately disclosed the same thing. That other leak has no bearing in this case.
Ironically, though, getting the source code of Gemini perhaps wouln't be valuable at all; but if you had found/obtained access to the corpus that the model was pre-trained with, that would have been kind of interesting (many folks have many questions about that...).
Definitionally, that input gets compressed into the weights. Pretty sure there's a proof somewhere that shows LLM training is basically a one-way (lossy) compression, so there's no way to go back afaik?
https://github.com/ezequielpereira/GAE-RCE/tree/master/proto...
I don't understand why security conferences are attracted to Vegas. In my opinion its a pretty gross place to conduct any conference.
Anyways, security conferences such as BSides run all over the world in various cities where red teaming type activities is embraced. IMO it'd be nice to diversify from Vegas, preferably places with more scenery/greenery like Boulder or something.
I guess this is a failing of the security review process, and possibly also how the blaze build system worked so well that people forgot a step existed because it was too automated.
So does Google Chrome.
Somewhat relatedly, it occurred to me recently just how important issues like prompt injection, etc are for LLMs. I've always brushed them off as unimportant to _me_ since I'm most interested in local LLMs. Who cares if a local LLM is weak to prompt injection or other shenanigans? It's my AI to do with as I please. If anything I want them to be, since it makes it easier to jailbreak them.
Then Operator and Deep Research came out and it finally made sense to me. When we finally have our own AI Agents running locally doing jobs for us, they're going to encounter random internet content. And the AI Agent obviously needs to read that content, or view the images. And if it's doing that, then it's vulnerable to prompt injection by third party.
Which, yeah, duh, stupid me. But ... is also a really fascinating idea to consider. A future where people have personal AIs, and those AIs can get hacked by reading the wrong thing from the wrong backalley of the internet, and suddenly they are taken over by a mind virus of sorts. What a wild future.
This already happens to people on the internet.
""""" As companies rush to deploy AI assistants, classifiers, and a myriad of other LLM-powered tools, a critical question remains: are we building securely ? As we highlighted last year, the rapid adoption sometimes feels like we forgot the fundamental security principles, opening the door to novel and familiar vulnerabilities alike. """"
There this case and there many other cases. I worry for copy & paste dev.
> but those files are internal categories Google uses to classify user data.
I really want to know what kind of classification this is. Could you at least give one example? Like "Has autism" or more like "Is user's phone number"?
Very useful for any scenario where you output the proto, like logs, etc…
And "leaked its source code" is straight up click bait.
(Submitted title was "We hacked Google's A.I Gemini and leaked its source code (at least some part)")
I mean I “hacked” this site too by those standards.
Protobufs aren't really these super secret hyper-proprietary things they seem to make them out to be in this breathless article.
In other words, it should not be about the intent of the requester, but the intent of its owner; and in the case of that article, either by bias in narrative, or the fact that it rhymes with events of the past, there is some tomfoolery about.
Edit: I don't know why the parent comment was flagged. It is entirely accurate.
A valuable information would be able to run those RPC calls as Principal (their root user)