One of these architectures particularly struck me and it is the "Dataflow architecture", in which there is no Program counter, but the instructions are executed in real contemporaneity whenever the operands are available.
In trying to find a technical solution that implements this type of architecture effectively (spoiler: yes, maybe I succeeded but I still have to test the real correct functioning!) I realized that this solution works more than well with three logical states. And at this point I discovered that perhaps it is better to implement a CPU with three states (ternary) instead of a white one. This as a first step to realize my dataflow solution.
Obviously a world has opened up to me regarding ternary CPUs, I have discovered not only that they are the best solution implemented for a computing device, but also that an increasing number of university and researcher papers are dealing with them in recent times.
This is also thanks to the enormous potential compared to normal CPUs; lower circuit complexity (and also lower consumption and lower heat production) but at the same time the truly formidable information representation capabilities compared to binary counterparts!
So here I am creating this ternary CPU and all the hardware and software to be able to use it immediately. In the link you have found some other details, but obviously for questions, suggestions or anything else you can insert your comment here!
If you're more interested in exploring your ideas about ternary dataflow computing, is there a reason you didn't start with an FPGA instead? The gap between idea and prototype is much smaller and there's even dataflow DSLs like Tapa if you don't want to write the HDL:
Let's try to make sense of that code shown:
...
jsr println
anyi r3 r0 #1324
jsr println
anyi r3 r0 #1412
jsr println
hlt
r0 appears to be the zero register, "anyi" combines that with an immediate to load a pointer to the string to print into r3. The immediate field is 12 trits, so we have a maximum address range of 531441 (trytes?)The encoding for the "jsr println" is always the same, so it's using absolute addresses, range is 3^20 or slightly less than 3.5G. Or would be if we were allowing negative addresses, since it appears to be balanced ternary (+--+00 = 144 decimal)
Next bit of code is println, the address matches the one from jsr.
println:
ld r2 0(r3) ;get char
jeq r2 r0 exit_println ;exit if nul
out 50(r0) r2 ;output to console0?
out 60(r0) r2 ;output to console1?
addi r3 r3 #4 ;next char - why add 4?
jmp println ;loop
exit_println:
anyi r4 r0 #10 ;ASCII LF
out 60(r0) r4 ;output to console1?
anyi r4 r0 #13 ;ASCII CR
out 60(r0) r4 ;output to console1?
jr r26 ;return (r26 = link register set by jsr)
I'm not sure if this is supposed to be a word-addressed machine. Or would be better if it actually was. The addressing unit appears to be 6-trit "trytes", but loads only work when aligned to 4? Which seems a bit (no pun intended) inconvenient for a ternary computer! Is there no "load tryte" instruction?Jump instructions are encoded relative to the next instruction, with a 12-trit immediate. Seems fine, but a bit inconsistent with how jsr uses absolute addresses. Also again every instruction is 4 trytes, so we could drop the low bits and shift... ah, I'm forgetting that it's ternary, never mind.
Separate I/O space, seems a bit un-RISCy. I personally happen to like x86, but even there it's considered a legacy feature nowadays.
---
This may sound harsh, but to me the architecture looks both fairly conventional, and at the same time like a bit of a mess, which would be fine for a hobby project, but not for something with a website making grandiose claims and clearly intending to be a commercial product.
- yes, R0 is constantly set to 0. - ANYI is a ternary function (ANY) that uses an immediate (ANYI). It does nothing but load the immediate value into the register indicated as the first operand. - Yes, it is a balanced ternary, and clearly negative addresses are correctly interpreted
>out 50(r0) r2 ;output on console0? out 60(r0) r2 ;output on console1?
- Exactly. 50 and 60 are the addresses of the two serial ports on the motherboard.
- R26 is the return register set by JSR
-Addressing is at tryte level and an instruction is always 4 trytes long. You are right that it seems a bit strange, but it is a compromise that I had to accept and that I think was the best solution to implement. (I am happy to note that your doubt was also our strong doubt during the design phase, it means that we are considering the same problems...)
-There are the dimensions of the data to operate on as a suffix of the instructions (like on motorola 68000) for which there is an LD.T (load tryte), LD.S (load short) and LD.W (load a whole word of 24 tries). There are also instructions that load/save by addressing the memory as if it were a stack for example PUSH.W (R20),R4
>Even again each instruction is 4 tries, so we could eliminate the low bits and >move... oh, I forgot that it is ternary, it does not matter. You think I didn't think about it? ;)
- The IO space is separate, in the end it cost me almost nothing and I did it.
The RISC features that I have kept are essentially 2: - Constant instruction length - Memory access only with Load/Store
In fact among the RISC things that I could remove in a future version are precisely the single access to memory only with LD/ST, it is true that it is a main feature of RISC, but it is also true that it could be a problem for code density. But it is something that I will see on the 48-trit architecture
>This may sound harsh, but to me the architecture looks both fairly conventional, and at the same time like a bit of a mess, which would be fine for a hobby project, but not for something with a website making grandiose claims and clearly intending to be a commercial product.
-That it is "conventional" ok, (even if none of the conventional architectures are ternary, but whatever...) that it is messy, what do you mean? The website says what the project actually is. I'm glad they are grandiose things, but that's it. And yes, we want to become a commercial product soon. If you have suggestions or other specific questions, I'm here to answer you ;)
The console I/O also seems to be dependent on external hardware to delay until the last character has been actually sent, or maybe your prototype was clocked slow enough that this wasn't necessary?
So, if the external data bus is always 24 trits, however addressing is in units of 6 trits, how does the hardware (CPU->memory interface) handle that? Is the address bus using binary instead of ternary?
What I meant by "conventional" is that other than using base 3, it isn't that different from other computer architectures, certainly not the idea you describe in the OP (that must have come later, since it isn't mentioned at all on your website). Some ternary operations might be useful for neural nets, but that wouldn't be an application for a general-purpose processor like this.
And some of the "messiness" might have been fixed, but again, there is absolutely no specification of what this architecture is supposed to be, so I was going from the one snippet of code that you have made public. And maybe you're still making changes, or do you intend to market a commercial microprocessor with absolutely no published specification?
Uhmm..actually the OUT instruction ends immediately after its normal cycle. the character is sent to the mainboard that reads it and puts it in a buffer and passes from a buffer. (it's more complex, if you want we can discuss)
The external BUS is 24 trit. the motherboard converts it by encoding it into binary and the data is stored on 48bit RAM. We don't have native ternary RAM yet unfortunately, but it was the only way to have some memory and use the processor.
I don't quite understand what idea I describe in the OP you're referring to. And I don't understand what ternary operations might be useful for neural networks (we've had contact with Korean researchers asking about this processor for neural networks).
The specifications obviously exist, but I don't think it's a good idea to put them here on a generic site! I think the marketing phase will take a while longer and as you may have read on the site, we're still fixing things here and there and we need a marketing apparatus.
Re. memory: of course with no ternary RAM, you have to somehow encode the trits into bits. But I was more curious about the address bus. So if your memory is organized like this:
--- word #0
--0
--+
-0-
-00 word #1
-0+
-+-
-+0
-++ word #2
how does that map to binary addresses for the RAM chip(s)? What if you had ternary RAM, would that change how you address it? Does address --0 map to the second lowest/highest (depending on endianness) tryte in word #0, or something entirely different?Maybe you thought about this a lot more than I have and figured out an ingenious solution. But I would have either gone with word addresses, or power-of-3 units.
And how all of this ties into non-Von Neumann architectures, I still have no clue.
As for the bus addresses, the counter program starts from the lowest address. The mainboard obviously knows how much RAM is present and performs a translation to the first available low address. Nothing transcendental.
I did not understand the connection with VonNeumann architectures, this is a classic vonneumann architecture, only that the information is in base 3...
Anyway you seem to be really very expert, certainly more than me (I am mainly a programmer and I had to learn these things at low level practically by myself), you could actively help the project...
>The mainboard obviously knows how much RAM is present and performs a translation to the first available low address.
How exactly is this done in hardware? I can't figure it out, so you must be the expert on that. Unless it's like a separate microcontroller doing div/mod in a loop to convert between the bases for every memory access, it couldn't be that, right? Right?
As for address management, as I said the mainboard does it all, but I didn't care to go that low in detail, it's all a simple VHDL function in an FPGA. It already comes to the FPGA in "ternary encoded bunary" from external circuits.
The way I see it, cool hobby project, except you've already created a website promising next generation AI supercomputer chips, and then basically admit that you don't even know what goes on at the level of logic gates. And seem to avoid giving any technical details at all.
Designing a high-performance CPU is difficult enough to do in conventional binary logic, and is generally done by teams of people who know much more than you or I about all sorts of details on how to pipeline instruction execution efficiently, with branch prediction and speculative execution etc., and also the constraints imposed by manufacturing processes and physics itself.
You can't just assume someone can magically turn your ideas into such a CPU. And if they could, they could probably do it without you and whatever intellectual property you seem to be wanting to keep secret. Also, ternary being considered more efficient in some mathematical way doesn't necessarily mean an actual hardware implementation will be similarly efficient.
With the recent paper that apparently shows that ternary data used for AI can be faster and more energy efficient, I am intrigued that someone is experimenting with ternary in hardware.
I am a layman — can a conventional computer architecture be taken and ternary data and instructions grafted on to it? Or does the whole architecture need to be rebuilt from the ground up?
I guess I don't see, for example, any advantage to ternary addressing, only ternary data and operations.
I imagine it's possible even if quite difficult to merge the two architectures, but maybe it's really easier to start everything from scratch as far as hardware is concerned.
What do you mean by ternary addressing? The address bus is also in ternary so even in this case fewer wires and more addresses available!
But I suppose searching on "ternary LLM" would find others.
Second question, what do the logic gates look like in a ternary system? Is there a list of them somewhere?
For ternary logic functions you can refer to this site that inspired us: https://homepage.cs.uiowa.edu/~jones/ternary/
Thanks for the help, maybe we need someone to write the basic software right now!
Discussed a few days ago here: https://news.ycombinator.com/item?id=42329307
Either way, great work. This is very fascinating.
Expect all the normal operations a binary computer/processor does (load, store, jumps, subroutines, etc.) with the addition of ternary logic functions. It's clear that the whole thing is encoded in balanced ternary :)
But, as the old saying goes, "out of sight is out of mind".
https://en.wikipedia.org/wiki/Content-addressable_memory#Ter...