This is comparable to the GBA, which has 384KB of total RAM, and a ROM cartridge slot for storing the game code and data. But the GBA is only 16MHz, the EFR32MG24 system used for this project is overclocked to 136.5MHz.
I assume you are thinking of the 32KiB of on-chip work RAM plus 256KiB of on-board work RAM plus 96KiB of video RAM. But pedantically there is also a 1KiB region of palette RAM and 1KiB of "object attribute memory", separate from the VRAM, making 386KiB total. (Not counting the I/O control registers, which one ordinarily wouldn't think of as "memory" but get a dedicated region of that address space.)
Aside from the ROM on a cartridge - up to 32MiB - there is 16KiB of BIOS ROM, and the system can address 64KiB of EEPROM for game save data.
Quake will probably run at 60 FPS on RP2350. Double buffered and with full sound quality. But it's nowhere near as hard to achieve it as on Arduino Nano Matter board. RP2350 got 520 kB RAM, dual core Cortex M33 and can run even at 300 MHz (150 MHz nominal).
Anything you put out a dedicated video port is gone to you.
Still it was done with 50% more memory, 1/3 of resolution and not implementing the whole game features.
ARM7TDMI takes 1-4 cycles to perform a simple 32bit x 32bit multiply, depending on the multiplier. I believe Cortex M33 takes just 1 cycle to do same. ARM7TDMI has no divide instruction and critically, no FPU that Quake requires.
GBA has only 32 kB of 0-wait state RAM (AKA internal working RAM). Versus 276 kB on the Arduino Nano.
GBA's 256 kB RAM block (external working RAM) has massive 6 cycle access time when loading a 32-bit value.
It's a true miracle someone managed to even get 1/3 of resolution on this weak hardware!
I guess FPU would not be even required with 120 pix horizontal resolution.
CM33 does in a single cycle even more: 2 16 bits multiplications, addition and accumulation, for instance.
Still it is the first time the "full" Quake was ported in less than 300 kB.
Quake performs one FPU divide per pixel for texture mapping perspective correction.
ARM7TDMI does not have any kind of divide, so perspective correction is tricky, even if it's just 120 px horizontally.
By the way, it's "d_scan.c" for anyone who's trying to web search for it.
Quake had to do this because it would have been too much especially for a low-end Pentium when it was released in 1996. Yes it is not even noticeable, especially at low res.
There are some user made maps however where this can be seen (e.g. i remember playing a map which was supposed to be inside a fantasy town and it used a bunch of wood-on-wall textures that made the distortion apparent).
The youth, so sweet and naive :) EDO ram on average Pentium motherboard does around 50-70MB/s. 256-1024KB of L2 cache bumps that to 70-120MB/s depending on chipset and cache type (and obviously usage pattern, Quake wasnt optimized on that aspect at all). Tiny 8KB of L1 below 200MB/s.
Quake was indeed optimized to work on such 1996 Pentium PCs. Look for instance how the edge/surface/span arrays are allocated in the stack: they allocate extra size, to be sure that the data will be aligned to the cache line size.
320MB/s is even faster than theoretical maximum of EDO on Pentium platform. 8 bytes x 66MHz / 5-2-2-2 timings = <260MB/s burst.
Quake optimized for prefilling caches, but not for contemporary cache sizes. https://dependency-injection.com/2mb-cache-benchmarks/ Doom gains tiny amount when going from 256KB to 512KB, Quake linearly gains all the way to mindbogglingly absurd 2MB of L2. Could really benefit from data-oriented design, but there was no tooling for that at the time not to mention time crunch, Abrash did all he could under circumstances.
Peak bandwidth shall be considered with ideally infinite (large enough) payload to make latency negligible. When you have these two values, latency and peak bandwidth, you can estimate your (still theoretical, of course) performance given the transfer size.
The article uses the 240-320 MB/s peak bandwidth, and 110-130 ns latency for a comparison with the used external flash, which has latency in the us range and a peak bandwidth of 17 MB/s (arguably assuming using infinite payload, as 136.5/8 is about 17, i.e. without taking the initial setup time).
Still, even if you compare the actual speeds of a 1996 Pentium with the theoretical external flash speed values cited in the article, the consideration does not change: the external flash is much slower than what you could get even in 1996.
Its not about the RAS. Bandwidth is bandwidth. When someone says
> In fact, the bandwidth for sequential reads varied a lot but with a 40 MHz EDO 64-bit DRAM (already available on 1996) one could get a maximum throughput of 320 MB/s
it tells me they multiplied 40MHz by 8 bytes and called it good. Thats not how EDO works. EDO still needs a CAS cycle for every new access, even linear. Its BEDO (Burst EDO) that has a 5-1-1-1 pattern.
https://www.electronics-notes.com/articles/electronic_compon...
https://dosdays.co.uk/topics/chipsets.php#VP1
BEDO DRAM Read Timings (66MHz) 5-1-1-1
EDO DRAM Read Timings (66MHz) 5-2-2-2
FPM DRAM Read Timings (66MHz) 5-3-3-3
SDRAM Read Timings (66MHz) 5-1-1-1
Absolute maximum _purely theoretical_ EDO burst bandwidth at 66MHz is <260MB/s. That doesnt take into account reality of 1996 hardware. Processors (Intel still hasnt acknowledged 'rep movsb' should be optimized), Chipsets and their Cache subsystems (cache on same bus as ram so no parallel accesses, lookup slows down reads). On real hardware 50-70 MB/s is all you get.>The article uses the 240-320 MB/s
Article states "1996) one could get a maximum throughput of 320 MB/s" which is ~5x higher than reality. Im not arguing the achievement realized here is somehow lesser because of this mistake. Im pointing out assumptions about vintage hardware were incorrectly inflated. In fact those assumptions might have led to lower expectations and worse outcome. Usually learning something is possible with less is a strong catalyst to try until you get there. Great example of this effect while still staying on topic, Video7 FIFO story told by Abrash https://www.bluesnews.com/abrash/chap64.shtml
Abrash: "push past the limits he had unconsciously set in coming up with his original design. And, in the end, I think that the single most important element of great design, whether it be hardware or software or any creative endeavor, is precisely what the Paradise news triggered in Tom: The ability to detect the limits you have built into the way you think about your design, and transcend those limits."
I don't see possibility of commenting on the link I posted (silabs community), but on the detailed blog one can leave comments. I will post a there a link to this conversation, asking to address the issue and we'll see what they say.
Many modern software should really be done this way to limit the amount of energy used. Specially on laptops, but also in the cloud.
Yet, it is mostly never worth it to optimize compared to adding more features to fill a list ;)
bragr@<>:~$ dig +short community.silabs.com
community.silabs.com.00da0000000l2kimas.live.siteforce.com.
sdc.prod.communities.salesforce.cdn.edgekey.net.
e78038.dsca.akamaiedge.net.
173.223.234.17
173.223.234.11
bragr@<>:~$ curl -Is https://community.silabs.com/s/share/a5UVm000000Vi1ZMAS/quake-ported-to-arduino-nano-matter-and-sparkfun-thing-plus-matter-boards?language=en_US | grep -i cache
cache-control: no-cache,must-revalidate,max-age=0,no-store,private
x-origin-cache-control: no-cache,must-revalidate,max-age=0,no-store,private
That said, the assets are cacheable so there was probably just a thundering hurd for the assets until they were well cached by Akamai's mid and edge tiersIf it doesn't feel like it's cached, it probably isn't; but you can't assume the cache-control headers you see are controlling the CDN.
It's exactly what 2024 feels like. Future sucks.