The title is an amusing crash blossom; it's about a generator of fast 6502 code, rather than a fast generator of 6502 code or one of the other four possibilities.
He says it generates better code than anything he's seen, and on reading the writeup, it's easy to see why:
> The ordered basic blocks are then processed from beginning to end, generating multiple assembly code combinations per operation. The list of combinations will grow at an exponential rate, but most can be pruned using ideas from dynamic programming and branch-and-bound. (...) simulated annealing (...) solving a partitioned boolean quadratic problem (PBQP).
It's tempting to assume that all this would have been infeasible for financial reasons when the 6502 was in common use, and maybe it is a little memory-hungry, but I think the bigger issue is that this is the kind of work you'd normally expect to see in a doctoral dissertation in operations research, applied mathematics, or GOFAI; there just weren't that many people out there who such an approach would occur to.
Now that it's out there, though, it seems like you could apply the same techniques to code generation for more mainstream hardware. (Some of the techniques like the tree-search register allocation would probably require adjustment.) Maybe it wouldn't provide the same benefits because generating optimal code is so easy now, I don't know. Has anyone tried?
This is a super hard-core hobby project! The 6502 is a great CPU, at least if you are hand-writing assembler for it, but a poor target for high level languages and difficult to generate high quality code for due to the extremely limited number of registers, quirky addressing modes, and lack of regularity. It's great to see the 6502 getting some love, even if it's 40 years too late to be very useful!
Not to detract from this work - it's really cool and interesting! But i wonder if it would have been relevant at the time.
They are NOT comparing the output from the same programming languages and so probably not the same source code either.
- Their compiler uses a custom language (NesFAB) and does NOT support C
- The reference compilers are all using C
"NESFab is a new programming language for creating NES games. Designed with 8-bit limitations in mind, the language is more ergonomic to use than C, while also producing faster assembly code."
And http://pubby.games/nesfab/doc.html#_what_is_nesfab
Presumably they made a best effort to implement their algorithm as closely as possible in the two languages, but that's a lot different than comparing the output of compilers using identical input source code.
So instead of saying
- "In a bizarre twist, my compiler generates faster code than GCC, LLVM, and every other compiler I compared it to."
It would be more accurate to say
- " In a bizarre twist, my compiler [and language] generate faster code than GCC, LLVM, and every other compiler I compared it to [using C].
extern char fn(void);
char kekw(void) {
const char foo = fn() ^ 5;
return foo - (foo & 3);
}
as a running example, and while godbolt only has cc6502, that thing indeed generates worse code, with helper subroutine calls and doing some strange stack juggling.Their example output
jsr fn
eor #5
ldx #3
sax S
sec
sbc S
rts
Is better but not drastically so than the output of another 6502 compiler on godbolt: _kekw_sloc0_1_0:
.ds 1
_kekw:
jsr _fn
eor #0x05
tax
and #0x03
sta *_kekw_sloc0_1_0
txa
sec
sbc *_kekw_sloc0_1_0
rts
A decent recent summary here showing a lot of various approaches over the years (doesn't include NesFAB though)
https://8bitworkshop.com/blog/compilers/higher-level-6502-pr...
A possibly interesting comparision of KickC vs CC65 vs VBCC vs 6502-gcc
https://sgadrat.itch.io/super-tilt-bro/devlog/219534/benchma...
My goal is to build an NES emulator. I will try using this code generator to check compatibility.
Source code: https://github.com/willtobyte/NES/blob/main/MOS6502.lua
There's really not a whole lot of code to generate, and your compiler host machine is a beast compared to the target machine. Focusing on making code generation faster makes almost zero sense in this case.