What could go wrong?
Jokes aside, this is insanely cool!
More interesting is that you made an easy to use environment authoring tool that (I haven’t tried it yet) seems really slick.
Both of those are impressive alone but together that’s very exciting.
I didn't see it in an obvious place on your github, do you have any plans to open source the training code?
I wonder if there are any computer vision projects that take a similar world emulation approach?
Imagine you collected the depth data also.
Is OP the blog’s author? Because in the post the author said that the purpose of the project is to show why NN are truly special and I wanted a more articulate view of why he/she thinks that? Good work anyway!
The special aspect of NNs (in the context of simulating worlds) is that NNs can mimic entire worlds from videos alone, without access to the source code (in the case of pokemon) or even without the source code having existed (as is the case for the real-world forest trail mimicked in this post). They mimic the entire interactive behavior of the world, not just the geometry (note e.g. the not-programmed-in autoexposure that appears when you look at the sky).
Although the neural world in the post is a toy project, and quite far from generating photorealistic frames with "trees that bend in the wind, lilypads that bob in the rain, birds that sing to each other", I think getting better results is mostly a matter of scale. See e.g. the GAIA-2 results (https://wayve.ai/wp-content/uploads/2025/03/generalisation_0..., https://wayve.ai/wp-content/uploads/2025/03/unsafe_ego_01_le...) for an example of what WMs can do without the realtime-rendering-in-a-browser constraints :)
Imagine a similar technique but with productivity software.
And a pre-trained network that adapts quickly.
edit: I see now that you mention a pricepoint of 100 GPU-hours/roughly 100$. My mistake.