I built SHDL (Simple Hardware Description Language) as an experiment in stripping hardware description down to its absolute fundamentals.
In SHDL, there are no arithmetic operators, no implicit bit widths, and no high-level constructs. You build everything explicitly from logic gates and wires, and then compose larger components hierarchically. The goal is not synthesis or performance, but understanding: what digital systems actually look like when abstractions are removed.
SHDL is accompanied by PySHDL, a Python interface that lets you load circuits, poke inputs, step the simulation, and observe outputs. Under the hood, SHDL compiles circuits to C for fast execution, but the language itself remains intentionally small and transparent.
This is not meant to replace Verilog or VHDL. It’s aimed at: - learning digital logic from first principles - experimenting with HDL and language design - teaching or visualizing how complex hardware emerges from simple gates.
I would especially appreciate feedback on: - the language design choices - what feels unnecessarily restrictive vs. educationally valuable - whether this kind of “anti-abstraction” HDL is useful to you.
Repo: https://github.com/rafa-rrayes/SHDL
Python package: PySHDL on PyPI
To make this concrete, here are a few small working examples written in SHDL:
1. Full Adder
component FullAdder(A, B, Cin) -> (Sum, Cout) {
x1: XOR; a1: AND;
x2: XOR; a2: AND;
o1: OR;
connect {
A -> x1.A; B -> x1.B;
A -> a1.A; B -> a1.B;
x1.O -> x2.A; Cin -> x2.B;
x1.O -> a2.A; Cin -> a2.B;
a1.O -> o1.A; a2.O -> o1.B;
x2.O -> Sum; o1.O -> Cout;
}
}2. 16 bit register
# clk must be high for two cycles to store a value
component Register16(In[16], clk) -> (Out[16]) {
>i[16]{
a1{i}: AND;
a2{i}: AND;
not1{i}: NOT;
nor1{i}: NOR;
nor2{i}: NOR;
}
connect {
>i[16]{
# Capture on clk
In[{i}] -> a1{i}.A;
In[{i}] -> not1{i}.A;
not1{i}.O -> a2{i}.A;
clk -> a1{i}.B;
clk -> a2{i}.B;
a1{i}.O -> nor1{i}.A;
a2{i}.O -> nor2{i}.A;
nor1{i}.O -> nor2{i}.B;
nor2{i}.O -> nor1{i}.B;
nor2{i}.O -> Out[{i}];
}
}
}3. 16-bit Ripple-Carry Adder
use fullAdder::{FullAdder};
component Adder16(A[16], B[16], Cin) -> (Sum[16], Cout) {
>i[16]{ fa{i}: FullAdder; }
connect {
A[1] -> fa1.A;
B[1] -> fa1.B;
Cin -> fa1.Cin;
fa1.Sum -> Sum[1];
>i[2,16]{
A[{i}] -> fa{i}.A;
B[{i}] -> fa{i}.B;
fa{i-1}.Cout -> fa{i}.Cin;
fa{i}.Sum -> Sum[{i}];
}
fa16.Cout -> Cout;
}
}I was curious how the language compiles to C, what the resulting code does, and how one interacts with it. It took a while of reading to find it, so maybe this could be linked from places where compilation is mentioned. This part is my favorite, it's cool how it works. Especially since you mention "anti-abstraction", I like seeing how the DSL maps to C.
https://github.com/rafa-rrayes/SHDL/blob/master/docs/docs/ar...
> Compiles circuits to C so that they can run anywhere
Input (Base SHDL):
component Buffer(A) -> (B) {
n1: NOT;
n2: NOT;
connect {
A -> n1.A;
n1.O -> n2.A;
n2.O -> B;
}
}
Output (C code): #include <stdint.h>
#include <string.h>
typedef struct {
uint64_t NOT_O_0;
} State;
static inline State tick(State s, uint64_t A) {
State n = s;
// NOT gate inputs
uint64_t NOT_0_A = 0ull;
NOT_0_A |= ((uint64_t)-( (A & 1u) )) & 0x1ull;
NOT_0_A |= ((uint64_t)-( ((s.NOT_O_0 >> 0) & 1u) )) & 0x2ull;
// Evaluate NOT gates
n.NOT_O_0 = (~NOT_0_A) & 0x3ull; // 2 active lanes
return n;
}
static inline uint64_t extract_B(const State *s) {
return (s->NOT_O_0 >> 1) & 1ull; // B from lane 1
}
...Here’s the core idea behind how SHDL compiles to C.
At compile time, SHDL groups all gates of the same typea together and packs them into uint64_t bitfields. Each individual gate occupies exactly one bit. If there are more than 64 gates of a given type, multiple uint64_t's are used.
So for example, if a circuit contains: 36 XOR gates - 82 AND gates - 1 NOT gate
The compiler will generate: 1 uint64_t for XOR (36 bits used, rest unused) - 2 uint64_t's for AND (64 + 18 bits) - 1 uint64_t for NOT
Each of these integers represents the state of all gates of that type at once.
The generated C code then works lane-wise: during `tick()`, it computes the inputs for all gates of a given type simultaneously using bitwise operations, and then evaluates them in parallel. Because everything is packed, a single ~, &, |, or ^ operates on up to 64 gates at once.
So instead of iterating gate-by-gate, the simulation step becomes something like: build input bitmasks - apply one bitwise operation per gate type - write the result back into the packed state
In other words, a full simulation step can advance dozens or hundreds of gates using just a handful of native CPU instructions. That’s the main reason the generated C code is both simple and fast.
This also ties directly into the “anti-abstraction” idea sunce there’s no hidden scheduler, no opaque simulator loop, and no dynamic dispatch. The DSL maps very explicitly to bit-level operations in C, and you can see exactly how a logical structure becomes executable code.
The final result is a compiled C shared library, which we can interact from using python (or anything else if you want to build it)
I really appreciate you calling this out. Do you think I should make it clearer in the docs? Thanks again for the comment!
I think this is backwards. Knowing that a signal is the clock, reset, data valid, adder result is far more important than the gate that drove it. The gates barely need names. Sadly, I think starting with that concept leads to a rather different language.
As you say, the communication interface is far more important than the gates. A true HDL can synthesize the gates for you, and indeed in an FPGA the gates don't really exist (LUTs instead). Optimization tools will further swirl the gates around once you start dealing with place and route - it may be more optimal to factor out common subexpressions, or "push bubbles" (invert OR/AND, De Morgan), or it may not.
The state of the art in Python HDLs is Chisel, btw.
OP says "understanding: what digital systems actually look like when abstractions are removed", which is a reasonable teaching step, and they themselves are probably learning a lot in the process. But it's not all that useful for getting stuff done. It's like learning assembly language, useful for unlocking understanding in your head, useful to read occasionally, but tiring to actually write anything substantial.
If you removed the explicit declaration of every gate in a preamble and then their wiring as a separate step, you could reduce the boilerplate a lot. This example could look like this:
component FullAdder(A, B, Cin) -> (Sum, Cout)
{
A XOR B -> AxB
A AND B -> AB
AxB XOR Cin -> S
(AxB AND Cin) OR AB -> C
Sum: S
Cout: C
}https://drive.google.com/file/d/1dPj4XNby9iuAs-47U9k3xtYy9hJ...
honorary mention: https://www.funghisoft.com/mhrd
A comment like this could turn from a bad one to a good one if it were written more in the key of curiosity: what are the similarities or differences? what are some pointers for further development? and so on. If you know more than someone else does, that's great, but then please share some of what you know so we can all learn.
Telling somebody that their project which they've been pouring their passion and creativity into is merely reinventing some well-known thing that's been around for years is going to come across as a putdown even when it isn't intended that way. The effect is to shut down creativity and exploration, which is the opposite of what this place is supposed to be for.