Nanite is a very clever representation of graphics meshes. They're directed acyclic graphs rather than trees. Repetition is a link, not a copy. It's recursive; meshes can share submeshes, which in turn can share submeshes, all the way down. It's also set up for within-mesh level of detail support, so the submeshes drop out when they're small enough. So you can have repetitive content of very large size with a finite amount of data and fast rendering times. The insight is that there are only so many pixels on screen, so there's an upper bound on rendering work really needed.
There's a really good SIGGRAPH video on this from someone at Epic.
Current GPU designs are a mismatch for Nanite, Some new hardware operations are needed to do more of this in the GPU, where it belongs. Whether that will happen, with NVidia distracted by the AI market, is a good question.
The scene needs a lot of instancing for this to pay off. Unreal Engine demos show such things as a hall of identical statues. If each statue was different, Nanite would help far less. So it works best for projects where a limited number of objects are reused to create large areas of content. That's the case for most AAA titles. Watch a video of Cyberpunk 2077, and look for railings and trash heaps. You'll see the same ones over and over in totally different contexts.
Making a nanite mesh is complicated, with a lot of internal offsets for linking, and so far only Unreal Engine's editor does it. With playback now open source, someone will probably do that.
Those internal offsets in the format present an attack surface which probably can be exploited with carefully crafted bad content, like hostile Microsoft Word .doc files.
While it does construct a DAG to perform the graph cut, the final data set on disk is just a flat list of clusters for consideration, along with their cutoffs for inclusion/rejection. There seems to be a considerable misunderstanding of what the DAG is used for, and how it's constructed. It's constructed dynamically based on the vertex data, and doesn't have anything to do with how the artist constructed submeshes and things, nor does "repetition become a link".
> The scene needs a lot of instancing for this to pay off. Unreal Engine demos show such things as a hall of identical statues. If each statue was different, Nanite would help far less.
What makes you say this? The graph cut is different for each instance of the object, so they can't use traditional instancing, and I don't even see how it could help.
Look at a terrain example:
In general, I wouldn't think of Nanite as "one thing". It's a combination of many, many different techniques that add up into some really good technology.
If your triangles are at or below the size of a texel, texture values could even be looked up offline and stored in the vertex attributes directly rather than keeping the UV coordinates around, but that may not be a win.
One such thing I did get a fair way into was something like Nanite - I called it compressive meshing. It is the typical case of misguided engineering hubris at work.
The initial work looked promising but the further into the problem I get the more complicated the entire thing become. Having to construct the entire asset generation pipeline was just way beyond what I could manage in the time frame that would look anything decent and not blow out the memory required.
I did manage to get something that vague resembled large scale meshes being rendered in a staggered level of detail but it ran SLOW and looked like rubbish unless you hammered the GPU to get sub-pixel accuracy. It was a fun experiment but it was far too much for the hardware and too big of a task to take on as a single programmer.
When Epic showed off Nantine... wow they did what I never could in a fashion way beyond even my best vision! It is one of those technologies that when it came along really was a true solution rather than just hype. Yes there are limits as with anything on that scale but it is one of the technical jewels of the modern graphics world. I have said that if Epic was public traded company I would considered putting in a sizable amount of money just based on Nanite tech alone.
Of course, the trajectory of GPU advancements is somewhat predictable, and settled down a little bit relative to the not-too-distant past. Perhaps some luck involved, too (:
There's also this short high-level intro (2.5 min) that I thought was decent: "What is virtualized micropolygon geometry? An explainer on Nanite" (https://www.youtube.com/watch?v=-50MJf7hyOw)
Not a major/mainstream engine by any means (a small Rust ECS game engine) but Bevy also supports something similar under the feature name "Virtual Geometry", mentioned here: https://bevyengine.org/news/bevy-0-14/#virtual-geometry-expe...
Also, a technical deep dive into the feature from one of the authors of the feature: https://jms55.github.io/posts/2024-06-09-virtual-geometry-be...
The logic behind nanite as I understood it was to keep the mesh accuracy at roughly 1 pixel precision. So for example, a low detail mesh can be used with coordinates rounded to just 10 bits (or whatever) if the resulting error is only about half a pixel when perspective projected onto the screen.
I vaguely remember the quantisation pulling double duty: not only does it reduce the data storage size it also helps the LOD generation because it snaps vertices to the same locations in space. The duplicates can then be eliminated.
meshoptimizer [1] is an OSS implementation of meshlet generation, which is what most people think of when they think of "Nanite's algorithm". Bevy, mentioned in a sibling reply, uses meshoptimizer as the generation tool.
(Strictly speaking, "Nanite" is a brand name that encompasses a large collection of techniques, including meshlets, software rasterization, streaming geometry, etc. For clarity, when discussing these concepts outside of the context of the Unreal Engine specifically, I prefer to refer to individual techniques instead of the "Nanite" brand. They're really separate, even though they complement one another. For example, software rasterization can be profitably used without meshlets if your triangles are really small. Streaming geometry can be useful even if you aren't using meshlets. And so on.)
That said I do know zeux was interested in experimenting with Nanite-like DAGs directly in meshoptimizer, so maybe a future version of the library will have an end-to-end API.
That's not what this is though. It's an implementation of the techniques/technology used in Nanite. It doesn't load data from Unreal Engine's editor. One of the mentioned goals:
Simplicity. We start with an OBJ file and everything is done
in the app. No magic pre-processing steps, Blender exports, etc.
You set the breakpoint at loadObjFile() and F10 your way till
the first frame finishes.
Unreal 5 was only released in 2022, and we have been iterating the Nanite idea since then. With Unreal 5.5 and more AAA Gaming titles coming in and we can take what we learned and put into hardware. Not to mention the lead time is 3-4 years down the road. Even if Nvidia decided to make one in 2023 it would have been at least 2026 before we see any GPU acceleration.
As for this project, Scthe did a great job! I've been talking with them about several parts of the process, culminating in some improvements to Bevy's code based on their experience (https://github.com/bevyengine/bevy/pull/15023). Always happy to see more people working on this, Nanite has a ton of cool ideas.
I am on Chromium, not Chrome, and use WebGPU all the time, but the demos tell me to use Chrome, which I cannot do ethically. Would love to try the demos out, this looks like a lot of hard work!
On Android you should have at least Android 12, with good enough Vulkan drivers, not blacklisted.
A lot of it runs fine with a flag.
Do you know what's blocking?
That's a fine goal.
When writing my own component framework for browsers, detection was regularly impossible and I had to depend on browser sniffing. Modernizr code has some very smart hacks (sometimes very dirty hacks) to detect features - a large amount of work for them to develop trustworthy detection code. And detection was usually via side-effects.
My educated guess is that feature detection for Web3D is not simple. A quick google and I didn't find an obvious Web3D feature detection library.
Here's part of the detection code for :checked support in Modernizr:
Modernizr.addTest('checked', function(){
return Modernizr.testStyles('#modernizr input {width:100px} #modernizr :checked {width:200px;display:block}', function(elem, rule){
>I am on Chromium, not Chrome
Don't know about your build, but I'm using Ungoogled Chromium, and it has the exact same user-agent string as Google Chrome.
Have you enabled the WebGL permission for the site in site settings? I think it was disabled by default for me.
WebGPU error [frame][validation]: Fill size (7160950) is not a multiple of 4 bytes. - While encoding [CommandEncoder "main-frame-cmd-buffer"].ClearBuffer([Buffer "rasterize-sw"], 0, 7160950).
I'm curious, what for?
Lots of future possibilities as well once support is more ubiquitous!
https://github.com/xenova/whisper-web/tree/experimental-webg...
https://huggingface.co/spaces/Xenova/whisper-speaker-diariza...
https://huggingface.co/onnx-community/pyannote-segmentation-...
Speaker diarization is quite difficult as you know, especially in loud or crowded environments, and the model is only part of the story. A lot of tooling needs to be built out for things like natural interruption, speaker memory, context-switching, etc. in order to create a believable experience.
https://vcg.isti.cnr.it/~ponchio/download/ponchio_phd.pdf (107 pages!)
To kind of get it working with 32 bit atomics this demo is reducing depth to just 16 bits (not enough to avoid artifacts) and only encoding a normal vector into the other 16 bits, which is why the compute rasterized pixels are untextured. There just aren't enough bits to store any more material parameters or a primitive ID, the latter being how Nanite does it.
https://developer.apple.com/metal/Metal-Feature-Set-Tables.p...
[1] https://webgpu.github.io/webgpu-samples/?sample=texturedCube
WebGPU error [init][validation]: 6 errors generated while compiling the shader: 50:22: unresolved call target 'pack4x8snorm' 50:9: cannot bitcast from 'â¥' to 'f32' 54:10: unresolved call target 'unpack4x8snorm' 59:22: unresolved call target 'pack4x8unorm' 59:9: cannot bitcast from 'â¥' to 'f32' 63:9: unresolved call target 'unpack4x8unorm'
There is also Bevy's Virtual Geometry that provides similar functionality and is probably much more useful since it's written in Rust and integrated with a game engine: https://jms55.github.io/posts/2024-06-09-virtual-geometry-be...
If I made an “implementation of OpenAI’s GPT-3 in JS” you would understand that to mean I took the architecture from the whitepaper and reimplemented it.
This technique is starting to appear in a variety of places. Nanite definitely made the idea famous, but Nanite is the name a specific implementation, not the name of the technique.
Godot has automatic LOD which seems pretty cool for what it is: https://docs.godotengine.org/en/stable/tutorials/3d/mesh_lod...
Unity also has an LOD system, though despite how popular the engine is, you have to create LOD models manually: https://docs.unity3d.com/Manual/LevelOfDetail.html (unless you dig through the asset store and find a plugin)
I did see an interesting approach in a lesser known engine called NeoAxis: https://www.neoaxis.com/docs/html/NeoAxis_Levels.htm however that engine ran very poorly for me on my old RX580, although I haven't tried on my current A580.
As far as I can tell, Unreal is really quite far ahead of the competition when it comes to putting lots of things on the screen, except the downside of this is that artists will be tempted to include higher quality assets in their games, bloating the install sizes quite far.
The main selling point of Nanite is really just to reduce artist costs by avoiding manual LODs. But a high quality automatic LOD at build time may (read: almost certainly does) strike a much better balance for both current and near future hardware
You can't have a manual LOD for a cliff where half is near the player and should be high resolution, and half is further away and can be low resolution. Nanite's hierarchical LODs are a huge improvement for this.
You're also underestimating the amount of time artists have to spend making and tweaking LODs, and how big of an impact skipping that is.
It's a bad value proposition for end-users. Nanite is much slower for the same image quality that a bespoke solution would offer, which is evident with several AAA titles that choose to use in-house tech over UE.
[1]: https://www.reddit.com/r/VoxelGameDev/comments/1bz5vvy/a_sma...
So if you have a very small triangle (small as in how many pixels on the screen it covers) that covers 1 pixel you will still pay the price of a 2x2 block (4 pixels instead of 1), so you just wasted 300% of your performance.
Nanite auto-picks the best triangle to minimize this and probably many more perf metrics that I have no idea about.
So even if you do it in software the point is that if you can get rid of that 2x2 block penalty as much as possible you could be faster than GPU doing 2x2 blocks in hardware since pixel shaders can be very expensive.
This issue gets worse the larger the rendering resolution is.
Nanite then picks larger triangles instead of those tiny 1-pixel ones since those are too small to give any visual fidelity anyway.
Nanite is also not used for large triangles since those are more efficient to do in hardware.
Of course the obvious problem with that is if you don't have most of the screen covered in such small triangles then you're paying a large cost for nanite vs traditional means.
1. HW does 2x2 blocks of pixels always so it can have derivatives, even if you don't use them..
2. Accessing SV_PrimitiveID is surprisingly slow on Nvidia/AMD, by writing it out in the PS you will take a huge perf hit in HW. There are ways to work around this, but they aren't trivial and differ between vendors, and you have to be aware of the issue it in the first place! I think some of the "software" > "hardware" raster stuff may come from this.
The HW shader in this demo looks wonky though, it should be writing out the visibility buffer, and instead it is writing out a vec4 with color data, so of course that is going to hurt perf. Way too many varyings being passed down also.
In a high triangle HW rasterizer you want the visibility buffer PS do a little compute as possible, and write as little as possible, so it should only have 1 or 2 input varyings and simply writes them out.
This is in contrast to hardware rasterization, where there is dedicated hardware onboard the GPU to decide which pixels are covered by a given triangle, and assigns those pixels to a fragment shader, where the color (and potentially other things) are computed, finally written to the render target as a raster op (also a bit of specialized hardware).
The seminal paper on this is cudaraster [1], which implemented basic 3D rendering in CUDA (the CUDA of 13 years ago is roughly comparable in power to compute shaders today), and basically posed the question: how much does using the specialized rasterization hardware help, compared with just using compute? The answer is roughly 2x, though it depends a lot on the details.
And those details are important. One of the assumptions that hardware rasterization relies on for efficiency is that a triangle covers dozens of pixels. In Nanite, that assumption is not valid, in fact a great many triangles are approximately a single pixel, and then software/compute approaches actually start beating the hardware.
Nanite, like this project, thus actually uses a hybrid approach: rasterization for medium to large triangles, and compute for smaller ones. Both can share the same render target.
[1]: https://research.nvidia.com/publication/2011-08_high-perform...
WebGPU error [frame][validation]: Fill size (7398781) is not a multiple of 4
bytes.
- While encoding [CommandEncoder "main-frame-cmd-buffer"].ClearBuffer([Buffer
"rasterize-sw"], 0, 7398781).
ID3D12Device::GetDeviceRemovedReason failed with DXGI_ERROR_DEVICE_HUNG (0x887A0006)
- While handling unexpected error type Internal when allowed errors are (Validation|DeviceLost).
at CheckHRESULTImpl (..\..\third_party\dawn\src\dawn\native\d3d\D3DError.cpp:119)
at CheckAndUpdateCompletedSerials (..\..\third_party\dawn\src\dawn\native\d3d12\QueueD3D12.cpp:179)
at CheckPassedSerials (..\..\third_party\dawn\src\dawn\native\ExecutionQueue.cpp:48)
at Tick (..\..\third_party\dawn\src\dawn\native\Device.cpp:1730)
Backend messages:
\* Device removed reason: DXGI_ERROR_DEVICE_HUNG (0x887A0006)
it's closed source, but I found the discussion and description of the tradeoffs interesting
Just a disclaimer that it will only work on WebGPU-enabled browser on Windows (Chrome, Edge, etc) unfortunately Mac has issues for now. Also, there is no Nanite in this demo, but it will be possible in the future.
> UE5's Nanite implementation using WebGPU. Includes the meshlet LOD hierarchy, software rasterizer and billboard impostors. Culling on both per-instance and per-meshlet basis.
WebGPU -> https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API
Meshlet -> https://developer.nvidia.com/blog/introduction-turing-mesh-s...
LOD -> https://en.wikipedia.org/wiki/Level_of_detail_(computer_grap...
Software rasterizer -> https://en.wikipedia.org/wiki/Rasterisation ("software" means it runs on the CPU instead of GPU)
Billboard imposters -> https://www.alanzucconi.com/2018/08/25/shader-showcase-satur...
Culling -> https://en.wikipedia.org/wiki/Hidden-surface_determination
no, in this context it means that the rasterisation algorithm is implemented in a compute kernel, rather than using the fixed hw built into the gpu. so rasterization still happens on the gpu, just using programmable blocks.
Yet still, this post is now ranked top 1 on HN.
Getting that on Chromium, lol.
Edit: WebGPU in chrome is behind a flag on linux: https://github.com/gpuweb/gpuweb/wiki/Implementation-Status#...
https://developer.mozilla.org/en-US/docs/Web/API/Device_orie...
No WebGPU available. Please use Chrome.
on chrome (Version 129.0.6668.29 (Official Build) beta (64-bit)) , under windowsTurn it back off when done, as tools like noscript only block webgl tags.
Cheers =3
> I could have built this with Vulkan and Rust. None would touch it.
Fucking .. bravo man.